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(57) Abstract 

A novel, generally applicable method for producing correctly folded proteins from a mixture of misfolded proteins, e.g. bacterial 
inclusion-body aggregates. A major new aspect of the method is that over-all efficiency is achieved by subjecting proteins to a time- 
sequence of multiple denaturation-renaturaUon cycles, resulting in gradual accumulation of the correctly folded protein. The method has 
proven efficient for a variety of recombinant proteins. Also provided are novel encrypted recognition sites for bovine coagulation factor 
Xa. The encrypted recognition sites described may be activated in vitro by controlled oxidation or by reversible derivatization of cysteine 
residues and thereby generate new cleavage sites for factor Xa- Two new recombinant serine protease exhibiting narrow substrate specificity 
for factor X a recognition sites are also provided. They may replace natural coagulation factor X« for cleavage of chimeric proteins. 
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IMPROVED METHOD FOR THE REFOLDING OF PROTEINS 
FIELD OF THE INVENTION 

* 

This invention relates to recombinant DNA technology and, in 
particular to protein engineering technologies for the pro- 
5 duction of correctly folded proteins by expression of genes 
or gene fragments in a host organism, heterologous or 
homologous, as recombinant protein products, by describing 
novel general principles and methodology for efficient in 
vitro refolding of misfolded and/or insoluble proteins, 

10 including proteins containing disulphide bonds. This inven- 
tion further relates to the refolding of unfolded or mis- 
folded polypeptides of any other origin. The invention also 
relates to novel designs of encrypted recognition sites for 
factor X a cleavage of chimeric proteins, sites that only 

15 become recognized after in vitro derivatization. Two ana- 
logues of bovine coagulation factor X a , suitable for small-, 
medium-, or large-scale technological applications involving 
specific cleavage of chimeric proteins at sites designed for 
cleavage by factor X a are provided, too. Finally the inven- 

20 tion relates to designs of reversible disulphide -blocking 
reagents, useful as auxiliary compounds for refolding of 
cysteine- containing proteins, including a general assay 
procedure by which such disulphide exchange reagents can be 
evaluated for suitability for this specific purpose. 

25 GENERAL BACKGROUND OF THE INVENTION 

Technologies for the production of virtually any polypeptide 
by introduction, by recombinant DNA methods, of a natural or 
synthetic DNA fragment coding for this particular polypeptide 
into a suitable host have been under intense development over 
30 the past fifteen years, and are at present essential tools 
for biochemical research and for a number of industrial 
processes for production of high-grade protein products for 
biomedical or other industrial use. 
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Four fundamental properties of biological systems render 
heterologous production of proteins possible: 



(i) The functional properties of a protein are entirely 
specified by its three-dimensional structure, and, due to the 

5 molecular environment in the structure, manifested by chemi- 
cal properties exhibited by specific parts of this structure. 

(ii) The three-dimensional structure of a protein is, in 
turn, specified by the sequence information represented by 
the specific sequential arrangement of amino acid residues in 

10 the linear polypeptide chain (s) . The structure information 
embedded in the amino acid sequence of a polypeptide is by 
itself sufficient, under proper conditions, to direct the 
folding process, of which the end product is the completely 
and correctly folded protein. 



15 (iii) The linear sequence of amino acid residues in the 

polypeptide chain is specified by the nucleotide sequence in 
the coding region of the genetic material directing the 
assembly of the polypeptide chain by the cellular machinery. 
The translation table governing translation of nucleic acid 

20 sequence information into amino acid sequence is known and is 
almost universal among known organisms and hence allows 
nucleic acid segments coding for any polypeptide segment to 
direct assembly of polypeptide product across virtually any 
cross -species barrier. 

25 (iv) Each type of organism relies on its own characteristic 
array of genetic elements present within its own genes to 
interact with the molecular machinery of the cell, which in 
response to specific intracellular and extracellular factors 
regulates the expression of a given gene in terms of trans - 

30 cription and translation. 



In order to exploit the protein synthesis machinery of a host 
cell or organism to achieve substantial production of a 
desired recombinant protein product, is it therefore neces- 
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sary to present the DNA- segment coding for the desired pro- 
duct to the cell fused to control sequences recognized by the 
genetic control system of the cell. 

The immediate fate of a polypeptide expressed in a host is 
5 influenced by the nature of the polypeptide, the nature of 
the host, and possible host organism stress states invoked 
during production of a given polypeptide. A gene product 
expressed in a moderate level and similar or identical to a 
protein normally present in the host cell, will often undergo 

10 normal processing and accumulation in the appropriate cellu- 
lar compartment or secretion, whichever is the natural fate 
of this endogenous gene product. In contrast, a recombinant 
gene product which is foreign to the cell or is produced at 
high levels often activate cellular defence mechanisms simi- 

15 lar to those activated by heat shock or exposure to toxic 
amino acid analogues, pathways that have been designed by 
nature to help the cell to get rid of "wrong" polypeptide 
material by controlled intracellular proteolysis or by segre- 
gation of unwanted polypeptide material into storage par- 

20 tides ("inclusion bodies"). The recombinant protein in these 
storage particles is often deposited in a misfolded and 
aggregated state, in which case it becomes necessary to 
dissolve the product under denaturing and reducing conditions 
and then fold the recombinant polypeptide by in vitro methods 

25 to obtain a useful protein product. 



Expression of eukaryotic genes in eukaryotic cells often 
allows the direct isolation of the correctly folded and 
processed gene product from cell culture fluids or from 
cellular material. This approach is often used to obtain 

30 relatively small amounts of a protein for biochemical studies 
and is presently also exploited industrially for production 
of a number of biomedical products. However, eukaryotic 
expression technology is expensive in terms of technological 
complexity, labour- and material costs. Moreover, the time 

35 scale of the development phase required to establish an 

expression system is at least several months, even for la- 
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boratory scale production. The nature and extent of post- 
translational modification of the recombinant product often 
differs from that of the natural product because such modifi- 
cations are under indirect genetic control in the host cell. 
5 Sequence signals invoking a post -synthetic modification are 
often mutually recognized among eukaryotes, but availability 
of the appropriate suit of modification enzymes is given by 
the nature and state of the host cell. 

A variety of strategies have been developed for expression of 
10 gene products in prokaryotic hosts, advantageous over 

eukaryotic hosts in terms of capital, labour and material 
requirements. Strains of the eubacteria Escherichia coli are 
often preferred as host cells because E. coli is far better 
characterized genetically than any other organism, also at 
15 the molecular level. 

Prokaryotic host cells do not posses the enzymatic machinery 
required to carry out post-translational modification, and an 
eukaryotic gene product will therefore necessarily be pro- 
duced in its unmodified form. Moreover, the product must be 

20 synthesized with an N- terminal extension, at least one addi- 
tional methionine residue arising from the required transla- 
tion initiation codon, more often also including an N- ter- 
minal segment corresponding to that of a highly expressed 
host protein. General methods to remove such N- terminal 

25 extensions by sequence specific proteolysis at linker seg- 
ments inserted at the junction between the N- terminal exten- 
sion and the desired polypeptide product have been described 
(Enterokinase-cleavable linker sequence: EP 035384, The 
Regents of the University of California; Factor X a -cleavable 

30 linker sequence: EP 161937, Nagai & Thsgersen, Assignee: 
Celltech Ltd. ) . 

Over the years a considerable effort has been directed at the 
development of strategies for heterologous expression in 
prokaryotes to generate recombinant protein products in a 
35 soluble form or fusion protein constructs that allow secre- 
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tion from the cell in an active, possibly N- terminally pro- 
cessed form, an effort resulting in limited success only, 
despite recent developments in the chaperone field. Typical- 
ly, much time and effort is required to develop and modify an 
5 expression system before even a small amount of soluble and 
correctly folded fusion protein product can be isolated. More 
often all of the polypeptide product is deposited within the 
host cell in an improperly folded state in "inclusion 
bodies". This is in particular true when expressing 
10 eukaryotic proteins containing disulphide bridges - 

Available methods for in vitro refolding of proteins all 
describe processes in which the protein in solution or non- 
specifically adsorbed to ion exchange resins etc. is exposed 
to solvent, the composition of which is gradually changed 

15 over time from strongly denaturing (and possibly reducing) to 
non -denaturing in a single pass. This is often carried out by 
diluting a concentrated solution of protein containing 6-8 M 
guanidine hydrochloride or urea into a substantial volume of 
non- denaturing buffer, or by dialysis of a dilute solution of 

20 the protein in the denaturing buffer against the non-dena- 
turing buffer. Numerous variants of this basic procedure have 

i 

been described, including addition of specific ligands or 
cofactors of the active protein and incorporation of polymer 
substances like polyethylene oxide (polyethylene glycol) , 
25 thought to stabilize the folded structure. 

Although efficient variants of the standard in vitro re- 
folding procedure have been found for a number of specific 
protein products, including proteins containing one or more 
disulphide bonds, refolding yields are more often poor, and 
30 scale-up is impractical and expensive due to the low solu- 
bility of most incompletely folded proteins which implies the 
use of excessive volumes of solvent. 

The common characteristic of all traditional in vitro 
refolding protocols is that refolding induced by sudden or 
35 gradual reduction of denaturant is carried out as a single- 
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pass operation, the yield of which is then regarded as the 
best obtainable for the protein in question. 



The general field of protein folding has been summarized in a 
recent text book edited by Thomas W. Creighton ("Protein 
5 folding" , ed. Creighton T.E., Freeman 1992) and a more speci- 
fic review of practical methods for protein refolding was 
published in 1989 by Rainer Jaenicke & Rainer Rudolph (p. 
191-223 in, "Protein Structure, a practical approach", ed. T. 
E. Creighton, IRL Press 1989) . Among the numerous more 
10 detailed publications, state-of-the-art reviews like those by 
Schein (Schein C. H. , 1990, Bio/Technology 8, 308-317) or 
Buchner and Rudolph (Buchner J. and Rudolph R, 1991 Bio/Tech- 
nology 9, 157-162) may be consulted. 

In conclusion, there is a definite need for generally appli- 
15 cable high-yield methods for the refolding of un- or misfol- 
ded proteins derived from various sources, such as 
prokaryotic expression systems or peptide synthesis. 

SUMMARY OF THE INVENTION 



It has been found by the inventors that refolding yields can 
20 be greatly increased by taking into account that the protein 
folding process is a kinetically controlled process and that 
interconversion between folded, unfolded and misfolded 
conformers of the protein are subject to hysteresis and time- 
dependent phenomena that can be exploited to design a cyclic 
25 denaturation-renaturation process, in which refolded protein 
product accumulates incrementally in each cycle at the 
expense of unfolded and misfolded conformers, to generate a 
new refolding process of much greater potential than the 
basic traditional approach. 

30 By the term "folded protein" is meant a polypeptide in (a) 

conformational state (s) corresponding to that or those occur- 
ring in the protein in its biologically active form or unique 
stable intermediates that in subsequent steps may be con- 
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verted to generate the biologically active species. The 
covalent structure of the folded protein in terms of 
crosslinking between pairs of cysteine residues in the 
polypeptide is identical to that of the protein in its bio- 
5 logically active form. 

Accordingly, the term "unfolded protein" refers to a 
polypeptide in conformational states less compact and well- 
defined than that or those corresponding to the protein in 
its biologically active, hence folded, form. The covalent 

10 structure of the unfolded protein in terms of crosslinking 
between pairs of cysteine residues in the polypeptide may or 
may not be identical to that of the protein in its biologi- 
cally active form. Closely related to an unfolded protein is 
a "misfolded protein" which is a polypeptide in a conformati- 

15 onal state which is virtually thermodynamically stable, 

sometimes even more so than that or those states correspon- 
ding to the protein in its folded form, but which does not 
exhibit the same degree, if any, of the biological activity 
of the folded protein. As is the case for the unfolded pro- 

20 tein, the covalent structure in terms of crosslinking between 
pairs of cysteine residues in the polypeptide may or may not 
be the same as that of the folded protein. 

By the term "refolded protein" is meant a polypeptide which 
has been converted from an unfolded state to attain its 
25 biologically active conformation and covalent structure in 
terms of crosslinking between correct pairs of cysteine 
residues in the polypeptide. 

The new generally applicable protein refolding strategy has 
been designed on the basis of the following general proper- 
30 ties of protein structure. 

(a) The low solubility of unfolded proteins exposed to non- 
denaturing solvents reflects a major driving force inducing 
the polypeptide either to form the compact correctly refolded 
structure or to misfold and generate dead-end aggregates or 
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precipitates, which are unable to refold and generate the 
correctly refolded structure under non- denaturing conditions 
within a reasonable amount of time. 

(b) A newly formed dead-end aggregate is more easily "dena- 
5 tured" i.e. converted into an unfolded form than the cor- 
rectly refolded protein because the structure of the dead-end 
aggregate is more disordered. Probably misfolding is also in 
general a kinetically controlled process. 

(c) An unfolded protein is often not (or only very slowly) 

10 able to refold into the correctly refolded form at denaturant 
levels required to denature dead-end aggregates within a 
reasonable amount of time. 

(d) The body of evidence available to support (b) includes 
detailed studies of folding and unfolding pathways and inter - 

15 mediates for several model proteins. Also illustrative is the 
observation made for many disulphide bonded proteins that the 
stability of disulphide bonds against reduction at limiting 
concentrations of reducing and denaturing agents is often 
significantly different for each disulphide bridge of a given 

20 protein, and that the disulphide bridges in the folded pro- 
tein are in general much less prone to reduction or 
disulphide exchange than "non- native" disulphide bonds in a 
denatured protein or protein aggregate. 

The new strategy for a refolding procedure is most easily 
25 illustrated by way of the following theoretical example: 

Consider a hypothetical protein - stably folded in a non- 
denaturing buffer "A" and stably unfolded in the strongly 
denaturing buffer "B" (being e.g. a buffer containing 6 M 
guanidine-HCl) - exposed to buffer A or to buffer B and then 
30 subjected to incubation at intermediate levels of 
denaturation in mixtures of buffers A and B. 
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Levels between e.g. 100-75% B lead to conversion of both 
folded protein and dead-end aggregated protein to the 
unfolded form within a short period of time. 

Levels between e.g. 75-50% B lead to conversion of newly 
5 formed dead-end aggregate to the unfolded form, whereas 

almost all refolded protein remains in a native- like 
structure, stable at least within a period of time of 
hours, from which it may snap back into the refolded form 
upon removal of the denaturant. 

10 Levels in excess of 10%B prevent rapid formation of 

refolded form from unfolded form. 

A solvent composition step from 100%B to 0%B converts 
unfolded protein to dead-end aggregate (75% yield) and 
refolded protein (25% yield) . 

15 Let us now subject a sample of this protein, initially in its 
unfolded form in 100%B, to a time- series of programmed 
denaturation-renaturation cycles as illustrated in Fig. 1, 
each consisting of a renaturation phase (F n ) (<10%B) and a 
denaturation phase (D n ) . At the end of the renaturation phase 

20 of cycled) the denaturant content is changed to a level, 
% less than the denaturant level of the previous cycle. 
Following a brief incubation the denaturant is again removed, 
and the next renaturation phase F i+1 entered. Assuming the 
denaturation level starts out at 100%B and k.^ for each cycle 

25 is fixed at 4%, this recipe will generate a damped series of 
"denaturation steps" dying out after 25 cycles. 

Through 25 cycles, as outlined above, the accumulation of 
refolded protein would progress as follows: 

In cycles 1 to 5 all of the protein, folded as well as 
30 misfolded will become unfolded in each of the denatur- 

ation phases D n . 
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Cycles 7 through 12: Dead-end aggregates will be con- 
verted to unfolded protein in each step whereas protein 
recoverable as refolded product will accumulate in the 
following amounts, cycle by cycle: 25%, 44%, 58%, 68%, 
5 76% and 82%. 

No further conversions take place through cycles 13 to 
25. 

The cyclic refolding process would therefore produce a total 
refolding yield of over 80%, whereas traditional one-pass 
10 renaturation at best would produce a yield of 25%. 

It will be appreciated that a great number of simplifying 
approximations in terms of all-or-none graduation of each 
characteristic of the various conformational states of the 
hypothetical protein have been made. The basic working prin- 
15 ciple, nevertheless, remains similar if a more complicated 
set of presumptions are incorporated in the model. 

Arranging a practical setup for establishing a cyclic 
denaturation/renaturation protein refolding process can be 
envisaged in many ways. 

20 The protein in solution could e.g. be held in an ultrafil- 
tration device, held in a dialysis device or be confined to 
one of the phases of a suitable aqueous two-phase system, all 
of which might allow the concentration of low-molecular 
weight chemical solutes in the protein solution to be con- 

25 trolled by suitable devices. 

Alternatively, the protein could be adsorbed to a suitable 
surface in contact with a liquid phase, the chemical compo- 
sition of which could be controlled as required. A suitable 
surface could e.g. be a filtration device , a hollow- fibre 
30 device or a beaded chromatographic medium. Adsorption of the 
protein to the surface could be mediated by non-specific 
interactions, e.g. as described in WO 86/05809 (Thomas Edwin 



WO 94/18227 



PCT/DK94/00054 



11 

Creighton) , by folding- compatible covalent bonds between 
surface and protein or via specific designs of affinity- 
handles in a recombinant derivative of the protein exhibiting 
a specific and denaturation- resistant affinity for a suitably 
5 derivatized surface. 

The specific implementation of the cyclic denaturat ion/re - 
naturation protein refolding process established to inves- 
tigate the potential of the general method was based on a 
design of cleavable hybrid proteins (EP 161937, Nagai & 

10 Thegersen, Assignee: Celltech Ltd.) containing a metal affin- 
ity handle module (EP 0282042 (Heinz Dobeli, Bernhard Eggi- 
mann, Reiner Gentz, Erich Hochuli; Hoffmann-La Roche)) 
inserted N- terminally to the designed factor X a cleavage 
site. Recombinant proteins of this general design, adsorbed 

15 on Nickel -chelating agarose beads could then be subjected to 
the present cyclic refolding process in a chromatographic 
column "refolding reactor" perfused with a mixture of suit- 
able denaturing and non- denaturing buffers, delivered by an 
array of calibrated pumps, the flow rates of which was time- 

20 programmed through computer control. 

A general scheme of solid-state refolding entails cycling the 
immobilized protein as outlined above or by any other means 
and implementations between denaturing and non -denaturing 
conditions in a progressive manner, in which the concentra- 

25 t ion of the denaturing agent is gradually reduced from high 
starting values towards zero over a train of many renatura- 
tion- denaturation cycles. Using this approach it is not 
necessary to determine precisely which limiting denaturant 
concentration is required to obtain folding yield enrichment 

30 in the course of cycling of the specific protein at hand, 
because the progressive train of cycles will go through (up 
to) three phases, an early phase in which folded product 
present at the end of cycle (i) is completely denatured at 
the denaturation step of cycle (i+1) , an intermediate produc- 

35 tive phase during which refolded protein accumulates in 

increasing quantity, and a late phase during which the con- 
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centration of denaturant is too low to perturb the refolded 
protein or any remaining misfolded structures. Subjecting the 
protein to a progressing series of denaturation-renaturation 
cycles as outlined will therefore include several productive 
5 cycles . 

For disulphide- containing proteins progressive denaturation- 
renaturation cycling may be enhanced by using equipment 
similar to advanced chromatography equipment with on-line 
facilities to monitor buffer compositions of folding reactor 

10 effluent. Information on effluent composition with regard to 
reductant and disulphide reshuffling reagent concentration 
profile would reveal productive cycling, and could therefore 
be used as input to an intelligent processor unit, in turn 
regulating the progression of denaturant concentration in a 

15 feed-back loop to ensure that most of the cycling effort is 
spent within the productive phase of the denaturation-renatu- 
ration cycle train. Such auto- optimization of cycling condi- 
tions would be possible because the analytical system may be 
used to measure extent and direction of changes in redox 

20 equilibrium in the buffer stream, measurements that directly 
reflect titration of thiol -groups /disulphide equivalents in 
the immobilized protein sample, and is therefore directly 
translatable into average number of disulphide bonds being 
disrupted or formed during the various phases of a cycle. 

25 Other possible inputs for the intelligent processor control- 
ling the progression of cycling include measurements of 
ligand- binding, substrate conversion, antibody binding abil- 
ity and, indeed, any other interacting soluble agent inter- 
acting in distinct ways with misfolded and folded protein, 

30 which in the assessing stage of folding measurement might be 
percolated through the refolding reactor and then in-line 
monitored in the effluent by suitable analytical devices. 

An intelligent monitoring and control system could further- 
more use the available information to direct usable portions 
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of reactor effluent to salvage/recycling subsystems thereby 
minimizing expenses for large scale operations . 

After execution of the folding procedure the final product 
may be eluted from the affinity matrix in a concentrated 
5 form, processed to liberate the mature authentic protein by 
cleavage at the designed protease cleavage site and then 
subjected to final work-up using standard protein purifica- 
tion and handling techniques, well-known within the field of 
protein chemistry. 

10 DETAILED DISCLOSURE OF THE INVENTION 

Thus, the present invention relates to a method for gener- 
ating a processed ensemble of polypeptide molecules, in which 
processed ensemble the conformational states represented 
contain a substantial fraction of polypeptide molecules in 

15 one particular uniform conformation, from an initial ensemble 
of polypeptide molecules which have the same amino acid 
sequence as the processed ensemble of polypeptide molecules, 
comprising subjecting the initial ensemble of polypeptide 
molecules to a series of at least two successive cycles each 

20 of which comprises a sequence of 

1) at least one denaturing step involving conditions 
exerting a denaturing influence on the polypeptide mo- 
lecules of the ensemble followed by 

2) at least one renaturing step involving conditions 
25 having a renaturing influence on the polypeptide mole- 
cules having conformations resulting from the preceding 
step. 

In the present specification and claims, the term "ensemble" 
is used in the meaning it has acquired in the art, that is, 
30 it designates a collection of molecules having essential 

common features. Initially ("an initial ensemble"), they have 
at least their amino acid sequence in common (and of course 
retain this common feature) . When the ensemble of polypeptide 
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molecules has been treated in the method of the invention (to 
result in "a processed ensemble") , the conformational states 
represented in the ensemble will contain a substantial frac- 
tion of polypeptide molecules with one particular conforma- 
5 tion. As will be understood from the discussion which fol- 
lows, the substantial fraction of polypeptide molecules with 
one particular conformation in the processed ensemble may 
vary 

dependent on the parameters of the treatment by the method of 
10 the invention, the size of the protein in the particular 
conformation, the length and identity of the amino acid 
sequence of the molecules, etc. In the examples reported 
herein, in which the process parameters have not yet been 
optimized, the fraction of polypeptide molecules with one 
15 particular conformation varied between 15% and 100% of the 
ensemble, which in all cases is above what could be obtained 
prior to the present invention. In example 13 it is further 
demonstrated that purification of the polypeptide molecules 
prior to their subjection to the method of the invention 
20 increases the fraction of polypeptide molecules with one 
particular conformation. 

"Denaturing step" refers to exposure of an ensemble of 
polypeptide molecules during a time interval to physical 
and/or chemical circumstances which subject the ensemble of 
25 polypeptide molecules to conditions characterized by more 

severe denaturing power than those characterizing conditions 
immediately prior to the denaturing step. 

Accordingly, the term "renaturing step" refers to exposure of 
an ensemble of polypeptide molecules during a time interval 
30 to physical and/or chemical circumstances which subject the 
ensemble of polypeptide molecules to conditions characterized 
by less severe denaturing power than those characterizing 
conditions immediately prior to the denaturing step. 

It will be understood, that the "substantial fraction" men- 
35 tioned above will depend in magnitude on the ensemble of 
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polypeptide molecules which are subjected to the method of 
the invention. If the processed ensemble of polypeptides 
consists of monomeric proteins of relatively short lengths 
and without intramolecular disulphide bridges the method will 
5 in general result in very high yields, whereas complicated 
molecules (such as polymeric proteins with a complicated 
disulphide bridging topology) may result in lower yields, 
even if the conditions of the method of the invention are 
fully optimized. 

10 An interesting aspect of the invention relates to a method 
described above wherein the processed ensemble comprises a 
substantial fraction of polypeptide molecules in one con- 
formational state the substantial fraction constituting at 
least 1% (w/w) of the initial ensemble of polypeptide mo- 

15 lecules. Higher yields are preferred, such as at least 5%, at 
least 10%, at least 20%, and at least 25% of the initial 
ensemble of polypeptide molecules. More preferred are yields 
of at least 30%, such as at least 40%, 50%, 60%, 70%, and at 
leat 80%. Especially preferred are yields of at least 85%, 

20 such as 90%, 95%, 97%, and even at 99%. Sometimes yields 
close to 100% are observed. 

When the polypeptide molecules of the ensemble contain 
cysteine, the processed ensemble will comprise a substantial 
fraction of polypeptide molecules in one particular uniform 
25 conformation which in addition have substantially identical 
disulphide bridging topology. 

In most cases, the polypeptide molecules subjected to the 
method of the invention will be molecules which have an amino 
acid sequence identical to that of an authentic polypeptide, 
30 or molecules which comprise an amino acid sequence correspon- 
ding to that of an authentic polypeptide joined to one or two 
additional polypeptide segments. 

By the term "authentic protein or polypeptide" is meant a 
polypeptide with primary structure, including N- and C-ter- 
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minal structures, identical to that of the corresponding 
natural protein. The term also denotes a polypeptide which 
has a known primary structure which is not necessarily iden- 
tical to that of a natural protein, which polypeptide is the 
5 intentional end-product of a protein synthesis. 

By the term "natural protein" is meant a protein as isolated 
• in biologically active form from an organism, in which it is 
present not as a consequence of genetic manipulation. 

In contrast, the term "artificial protein or polypeptide" as 
10 used in the present specification and claims is intended to 
relate to a protein/polypeptide which is not available from 
any natural sources, i.e. it cannot be isolated and purified 
from any natural source. An artificial protein/polypeptide is 
thus the result of human intervention, and may for instance 
15 be a product of recombinant DNA manipulation or a form of in 
vitro peptide synthesis. According to the above definitions 
such an artificial protein may be an authentic protein, but 
not a natural protein. 

Thus, the invention also relates to a method wherein natural 
20 proteins as well as artificial proteins are subjected to the 
refolding processes described herein. 

As will be explained in greater detail below, it may be 
advantageous for various reasons that the authentic 
polypeptide is joined to polypeptide segments having auxil- 

25 iary functions during the cycling and other previous or 
subsequent processing, e.g. as "handles" for binding the 
polypeptide to a carrier, as solubility modifiers, as expres- 
sion boosters which have exerted their beneficial function 
during translation of messenger RNA, etc. Such an auxiliary 

30 polypeptide segment will preferably be linked to the authen- 
tic polypeptide via a cleavable junction, and where two such 
auxiliary polypeptide segments are linked to the authentic 
polypeptide, this may be via similar cleavable junctions 
which will normally be cleaved simultaneously, or through 
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dissimilar cleavable junctions which may be cleaved in any 
time sequence. 

In accordance with what is explained above, it is believed to 
be a major novel characteristic feature of the present inven- 
5 tion that the cycling (which, as explained above, comprises 
at least two successive cycles) will give rise to at least 
one event where a renaturing step is succeeded by a dena- 
turing step where at least a substantial fraction of the 
refolded polypeptides will be denatured again, 

10 In most cases, the processing will comprise at least 3 

cycles, often at least 5 cycles and more often at least 8 
cycles, such as at least 10 cycles and, in some cases at 
least 25 cycles. On the other hand, the series of cycles will 
normally not exceed 2000 cycles and will often comprise at 

15 most 1000 cycles and more often at most 500 cycles. The 

number of cycles used will depend partly on the possibilities 
made available by the equipment in which the cycling is 
performed. 

Thus, if the cycling treatment is performed with the poly- 
20 peptide molecules immobilized to a carrier column, such as 
will be explained in greater detail below, the rate with 
which the liquid phase in contact with the column can be 
exchanged will constitute one limit to what can realistically 
be achieved. On the other hand, high performance liquid 
25 chromatography (HPLC) equipment will permit very fast 

exchange of the liquid environment and thus make cycle num- 
bers in the range of hundreds or thousands realistic. 

Other consideration determining the desirable number of 
cycles are, e.g., inherent kinetic parameters such as inter - 
30 conversion between cis and trans isomers at proline residues 
which will tend to complicate redistribution over the par- 
tially folded states and will thus normally require due 
consideration of timing. Another time- critical characteristic 
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resides in the kinetics of disulphide reshuffling (cf . the 
discussion below of disulphide -reshuf fling systems) . 

With due consideration of the above, the cycling series will 
often comprise at most 200 cycles, more often at most 100 
5 cycles and yet more often at most 50 cycles . 

In accordance with what is stated above, the duration of each 
denaturing step may be a duration which, under the particular 
conditions in question, is at least one millisecond and at 
most one hour, and the duration of each renaturing step may 
10 be a duration which, under the particular conditions in 
question, is at least 1 second and at most 12 hours. 

In most embodiments of the method, the denaturing conditions 
of each individual denaturing step are kept substantially 
constant for a period of time, and the renaturing conditions 

15 of each individual renaturing step are kept substantially 
constant for a period of time, the periods of time during 
which conditions are kept substantially constant being sepa- 
rated by transition periods during which the conditions are 
changed. The transition period between steps for which condi- 

20 tions are kept substantially constant may have a duration 

varying over a broad range, such as between 0.1 second and 12 
hours and will normally be closely adapted to the durations 
of the denaturing and renaturing steps proper. 

Bearing this in mind, the period of time for which the dena- 
25 turing conditions of a denaturing step are kept substantially 
constant may, e.g. have a duration of at least one millise- 
cond and at most one hour, often at most 30 minutes, and the 
period of time for which the renaturing conditions of a 
renaturing step are kept substantially constant has a du- 
30 ration of at least 1 second and at most 12 hours, and often 
at most 2 hours. 

In practice, the period of time for which the denaturing 
conditions of a denaturing step are kept substantially con- 
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stant will often have a duration of between l and 10 minutes, 
and the period of time for which the renaturing conditions of 
a renaturing step are kept substantially constant will often 
have a duration of between 1 and 45 minutes. 

5 It will be understood from the above, that adjustments should 
be made to the intervals stated above, taking into consi- 
deration the change of kinetics resulting from the change in 
physical conditions to which the polypeptides are subjected. 
For instance, the pressure may be very high (up to 5000 Bar) 

10 when using an HPLC system when performing the method of the 
invention, and under such circumstances very rapid steps may 
be accomplished and/or necessary. Further, as can be seen 
from the examples, the temperature parameter is of impor- 
tance, as some proteins only will refold properly at tempera - 

15 tures far from the physiological range. Both temperature and 
pressure will of course have an effect on the kinetics of the 
refolding procedure of the invention, and therefore the 
above -indicated time intervals of renaturing and denaturing 
steps are realistic boundaries for the many possible embodi- 

20 ments of the invention. 

For a given utilization of the method of the invention, the 
skilled person will be able to determine suitable conditions 
based, e.g., on preliminary experiments. 

As indicated above, the polypeptide molecules are normally in 
25 contact with a liquid phase during the denaturing and renatu- 
ring steps, the liquid phase normally being an aqueous phase. 
This means that any reagents or auxiliary substances used in 
the method will normally be dissolved in the liquid phase, 
normally in an aqueous phase. However, if convenient, the 
30 liquid phase may also be constituted by one or more organic 
solvents . 

In connection with renaturing of proteins, it is well known 
to use a so-called "chaperone" or "chaperone complex". Chape - 
rones are a group of recently described proteins that show a 



WO 94/18227 



PCT/DK94/00054 



20 

common feature in their capability of enhancing refolding of 
unfolded or partly unfolded proteins. Often, the chaperones 
are multimolecular complexes. Many of these chaperones are 
heat -shock proteins, which means that in vivo, they are 
5 serving as factors doing post- traumatic "repair" on proteins 
that have been destabilized by the trauma. To be able to 
fulfil this function, chaperones tend to be more stable to 
traumatic events than many other proteins and protein com- 
plexes. While the method of the invention does not depend on 

10 the use of a molecular chaperone or a molecular chaperone 
complex, it is, of course, possible to have a suitable mo- 
lecular chaperone or molecular chaperone complex present 
during at least one renaturing step, and it may be preferred 
to have a molecular chaperone or a molecular chaperone com- 

15 plex present during substantially all cycles. 

As mentioned above, the polypeptide molecules are preferably 
substantially confined to an environment which allows chan- 
ging or exchanging the liquid phase substantially without 
entraining the polypeptide molecules. 

20 This can be achieved in a number of ways. For instance, the 
polypeptide molecules may be contained in a dialysis device, 
or they may be confined to one of the phases of a suitable 
liquid two-phase system. Such a suitable aqueous two phase 
system may, e.g., contain a polymer selected from the group 

25 consisting of polyethylene oxide (polyethylene glycol), 
polyvinyl acetate, dextran and dextran sulphate. In one 
interesting setup, one phase contains polyethylene oxide 
(polyethylene glycol) and the other phase contain dextran, 
whereby the polypeptide molecules will be confined to the 

30 dextran- containing phase. 

Another way of avoiding entraining the polypeptide by having 
the polypeptide molecules bound to a solid or semisolid 
carrier, such as a filter surface, a hollow fibre or a beaded 
chromatographic medium, e.g. an agarose or polyacrylamide 
35 gel, a fibrous cellulose matrix or an HPLC or FPLC (Fast 
Performance Liquid Chromatography) matrix. As another 
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measure, the carrier may be a substance having molecules of 
such a size that the molecules with the polypeptide molecules 
bound thereto, when dissolved or dispersed in a liquid phase, 
can be retained by means of a filter, or the carrier may be a 
5 substance capable of forming micelles or participating in the 
formation of micelles allowing the liquid phase to be changed 
or exchanged substantially without entraining the micelles. 
In cases where the micelle- forming components would tend to 
escape from the system as monomers, e.g. where they would be 
10 able to some extent to pass an ultrafilter used in confining 
the system, this could be compensated for by replenishment 
will additional micelle- forming monomer. 

The carrier may also be a water-soluble polymer having mo- 
lecules of a size which will substantially not be able to 
15 pass through the pores a filter or other means used in con- 
fining the system. 

The polypeptide molecules are suitably non-covalently 
adsorbed to the carrier through a moiety having affinity to a 
component of the carrier. Such a moiety may, e.g., be a 

20 biotin group or an analogue thereof bound to an amino acid 

moiety of the polypeptide, the carrier having avidin, strept- 
avidin or analogues thereof attached thereto so as to estab- 
lish a system with a strong affinity between the thus mo- 
dified polypeptide molecules and the thus modified carrier. 

25 It will be understood that he affinity between the modified 
polypeptide and the modified carrier should be sufficiently 
stable so that the adsorption will be substantially unaffec- 
ted by the denaturing conditions; the removal of the 
polypeptide molecules from the carrier after the cycling 

30 should be performed using specific cleaving, such as is 
explained in the following. 

An example of a suitable amino acid residue to which a bioti- 
nyl group may be bound is lysine. 
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One interesting way of introducing an amino acid carrying a 
moiety having affinity to the carrier is CPY synthesis. CPY 
(carboxy peptidase Y) is known to be capable of adding amino 
acid amide irrespective of the nature of the side chain of 
5 that amino acid amide. 

In an interesting embodiment, the moiety having affinity to 
the carrier is the polypeptide segment SEQ ID NO: 47, in 
which case the carrier suitably comprises a Nitrilotriacetic 
Acid derivative (NTA) charged with Ni ++ ions, for instance an 
10 NTA- agarose matrix which has been bathed in a solution com- 
prising Ni ++ . 

An important aspect of the invention relates to the presence 
of suitable means in the polypeptide molecule preparing the 
molecule for later cleavage into two or more segments, where - 

15 in one segment is an authentic polypeptide as defined above. 
Such combined polypeptide molecules (fusion polypeptide 
molecules) may for this purpose comprise a polypeptide seg- 
ment which is capable of directing preferential cleavage by a 
cleaving agent at a specific peptide bond. The polypeptide 

20 segment in question may be one which directs the cleavage as 
a result of the conformation of the segment which serves as a 
recognition site for the cleaving agent. 

The cleavage -directing polypeptide segment may for instance 
be capable of directing preferential cleavage at a specific 
25 peptide bond by a cleaving agent selected from the group 

consisting of cyanogen bromide, hydroxyl amine, iodosobenzoic 
acid and N-bromosuccinimide. 

The cleavage -directing polypeptide segment may be one which 
is capable of directing preferential cleavage at a specific 
30 peptide bond by a cleaving agent which is an enzyme and one 
such possible enzyme is bovine enterokinase or an analogue 
and/or homologue thereof. 
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In an important aspect of the invention, the cleaving agent 
is the enzyme bovine coagulation factor X a or an analogue 
and/or homologue thereof (such analogues will be discussed in 
greater detail further below) , and the polypeptide segment 
5 which directs preferential cleavage is a sequence which is 
substantially selectively recognized by the bovine coagula- 
tion factor X a or an analogue and/or homologue thereof. 
Important such segments are polypeptide segments that have a 
sequence selected from the group consisting of SEQ ID NO: 38, 
10 SEQ ID NO: 40, SEQ ID NO: 41 and SEQ ID NO: 42. 

An interesting feature of the invention is the possibility of 
masking and unmasking polypeptide segments with respect to 
their ability to direct cleavage at a specific peptide bond, 
whereby it is obtained that different segments of the 
15 polypeptide can be cleaved at different stages in the cycles. 



Thus, when the polypeptide molecules comprise a polypeptide 
segment which is in vitro- convertible into a derivatized 
polypeptide segment capable of directing preferential clea- 
vage by a cleaving agent at a specific peptide bond, a mas- 

20 king/unmasking effect as mentioned becomes available. An 

especially interesting version of this strategy is where the 
in vitro- convertible polypeptide segment is convertible into 
a derivatized polypeptide segment which is substantially 
selectively recognized by the bovine coagulation factor X a or 

25 an analogue and/or homologue thereof. 

It is contemplated that both cysteine and methionine residues 
can be converted into modified residues, which modified 
residues make the segments having amino acid sequences 
selected from the group consisting of SEQ ID NO: 43, SEQ ID 
30 NO: 44, SEQ ID NO: 45 and SEQ ID NO: 46 in vitro- convertible 
into segments recognized by bovine coagulation factor X a or 
an analogue and/or homologue thereof. 
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According to the invention, one possible solution involving 
the cysteine residue is that a polypeptide segment with the 
amino acid sequence SEQ ID NO: 43 or SEQ ID NO: 44, is con- 
verted into a derivatized polypeptide which is substantially 
5 selectively recognized by bovine coagulation factor X a , by 
reacting the cysteine residue with N- (2-mercaptoethyl)morpho- 
lyl-2-thiopyridyl disulphide or mercaptothioacetate-2-thiopy- 
ridyl disulphide. 

A possible strategy according to the invention involving 
10 methionine is that a polypeptide segment with the amino acid 
sequence SEQ ID NO: 45 or SEQ ID NO: 46, is converted into a 
derivatized polypeptide, which is substantially selectively 
recognized by bovine coagulation factor X a , by oxidation of 
the thioether moiety in the methionine side group to a sulph- 
15 oxide or sulphone derivative. 

Preferred embodiments of the method according to the inven- 
tion are those wherein the cleavage- directing segments with 
the amino acid sequences SEQ ID NO: 38, SEQ ID NO: 40, SEQ ID 
NO: 41 or SEQ ID NO: 42, or the masked cleavage -directing 

20 segments with the amino acid sequences SEQ ID NO: 43, SEQ ID 
NO: 44, SEQ ID NO: 45 and SEQ ID NO: 46 are linked N- termi- 
nally to the authentic polypeptide, because then no further 
processing other than the selective cleaving is necessary in 
order to obtain the authentic polypeptide in solution. On the 

25 other hand, one possible reason for linking the cleavage 
directing sequences at the C- terminal end of the authentic 
polypeptide would be that the correct folding of the 
polypeptide molecules is dependent on a free N- terminal of 
the polypeptide molecules. In such a case, the part of the 

30 cleaving-directing sequence remaining after cleaving can be 
removed by suitable use of carboxypeptidases A and B. 

The change of conditions during the transition period between 
the steps may according to the invention be accomplished by 
changing the chemical composition of the liquid phase with 
35 which the polypeptide molecules are in contact. Thus, dena- 
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turing of the polypeptide molecules may be accomplished by 
contacting the polypeptide molecules with a liquid phase in 
which at least one denaturing compound is dissolved, and 
renaturing of the polypeptide molecules is accomplished by 
5 contacting the polypeptide molecules with a liquid phase 
which either contains at least one dissolved denaturing 
compound in such a concentration that the contact with the 
liquid phase will tend to renature rather than denature the 
ensemble of polypeptide molecules in their respective confor- 
10 mat ion states resulting from the preceding step, or contains 
substantially no denaturing compound. 

The expression "denaturing compound" refers to a compound 
which when present as one of the solutes in a liquid phase 
comprising polypeptide molecules may destabilize folded 

15 states of the polypeptide molecules leading to partial or 

complete unfolding of the polypeptide chains. The denaturing 
effect exerted by a denaturing compound increases with in- 
creasing concentration of the denaturing compound in the 
solution, but may furthermore be enhanced or moderated due to 

20 the presence of other solutes in the solution, or by changes 
in physical parameters, e.g. temperature or pressure - 

As examples of suitable denaturing compounds to be used in 
the method according to the invention may be mentioned urea, 
guanidine-HCl, di-C^galkylf ormamides such as dimethyl form- 
25 amide and di-C^ 6 - alky lsulphones. 

The liquid phase used in at least one of the denaturing steps 
and/or in at least one of the renaturing steps may according 
to the invention contain a least one disulphide-reshuf fling 
system. 

30 "Disulphide reshuffling systems" are redox systems which 

contain mixtures of reducing and oxidating agents, the pre- 
sence of which facilitate the breaking and making of 
disulphide bonds in a polypeptide or between polypeptides. 
Accordingly, "disulphide reshuffling agents" or "disulphide 
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reshuffling compounds" are such reducing and oxidating agents 
which facilitate the breaking and making of disulphide bonds 
in a polypeptide or between polypeptides. In an important 
aspect of the invention, the disulphide -reshuf fling system 
5 contained in the aqueous phase which is in contact with the 
proteins comprises as a disulphide reshuffling system a 
mixture of a mercaptan and its corresponding disulphide 
compound . 

As an example, all cysteine residues in the polypeptide 
10 molecules may have been converted to mixed disulphide prod- 
ucts of either glutathione, thiocholine, mercaptoethanol or 
mercaptoacetic acid, during at least one of the denatur- 
ing/renaturing cycles. Such a converted polypeptide is termed 
a "fully disulphide -blocked polypeptide or protein" and this 
15 term thus refers to a polypeptide or a protein in which 

cysteine residues have been converted to a mixed -disulphide 
in which each cysteine residue is disulphide -linked to a 
mercaptan, e.g. glutathione. The conversion of the cysteine 
residues to mixed disulphide products may be accomplished by 
20 reacting a fully denatured and fully reduced ensemble of 

polypeptide molecules with an excess of a reagent which is a 
high- energy mixed disulphide compounds, such as aliphatic - 
aromatic disulphide compounds, e.g. 2-thiopyridyl glutathion- 
yl disulphide, or by any other suitable method. 

25 As examples of high- energy mixed disulphides, that is, mixed 
disulphides having a relatively unstable S-S bond) may be 
mentioned mixed disulphides having the general formula: 

I 

30 R 1 -S-S-C-R 3 



wherein R x is 2-pyridyl, and each of R 2 , R3 and R 4 is hydro- 
gen or an optionally substituted lower aromatic or aliphatic 
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hydrocarbon group. Examples of such mixed disulphides are 
glutathionyl-2-thiopyridyl disulphide, 2-thiocholyl-2-thiopy- 
ridyl disulphide, 2-mercaptoethanol-2-thiopyridyl disulphide 
and mercaptoacetate-2- thiopyridyl disulphide. 

5 In interesting embodiments, the disulphide -reshuf fling system 
contains glutathione, 2-mercaptoethanol or thiocholine, each 
of which in admixture with its corresponding symmetrical 
disulphide. 

The suitability of a given mixture of thiols for use as 

10 selective reducing and/ or disulphide -reshuf fling system in a 
cyclic ref olding/reoxidation procedure for a specific protein 
product can be directly assayed by incubating ensembles of 
samples of a mixture of folded and misfolded protein with an 
array of thiol mixtures at several different concentrations 

15 of denaturant exerting weakly, intermediate or strongly 

denaturing effects on the protein. Following incubation, the 
disulphide topology in each sample is then locked by reaction 
with an excess of thiol -blocking reagent (e.g. Iodoacetamide) 
before subjecting each set of samples to SDS-PAGE under non- 

20 reducing conditions. Correctly disulphide -bridged material 
and material in undesired covalent topological states will 
appear in separate bands and will therefore allow quantita- 
tive assessment of folding state of the protein at the time 
of thiol -blocking, because only correctly unique disulphide - 

25 bonded topoisomer may correspond to correctly folded protein 
present at the end of incubation with thiol /disulphide and 
denaturant agents. This set of experiments allows identifi- 
cation of the range of denaturant levels at which a given 
thiol/disulphide reagent may be advantageously used as 

30 disulphide reshuffling agent, as revealed by preferential 
reduction and reshuffling of wrong disulphide bonds and low 
tendency to reduce bonds in the fully folded protein. This 
reagent testing procedure may be used as a general procedure 
for selecting advantageous reducing and/or thiol/disulphide 

35 reshuffling reagents. Example 12 demonstrates application of 
this analytical procedure to assess the suitability for 
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selective reduction of misfolded forms of a model protein for 
5 thiol reagents and thereby demonstrates the operability of 
the above procedure. 

It will be understood that the above- indicated procedure for 
5 selecting suitable disulphide reshuffling systems may also be 
employed for selecting other compositions than mixtures of 
thiols. Any mixture containing suitable reducing/oxidating 
agents may be evaluated according to the above indicated 
procedure, and the composition of choice in the method of the 
10 invention will be the one which shows the highest ability of 
preferentially reduce incorrectly formed disulphide bridges. 

Thus, a very important aspect of the invention is a method 
for protein refolding as described herein, wherein at least 
one disulphide -reshuf fling system contained in liquid phase 

15 in at least one renaturing and/or denaturing step is one 

which is capable of reducing and/or reshuffling incorrectly 
formed disulphide bridges under conditions with respect to 
concentration of the denaturing agent at which unfolded 
and/or misfolded proteins are denatured and at which there is 

20 substantially no reduction and/or reshuffling of correctly 
formed disulphide bridges. 

An interesting embodiment of the invention is a method as 
described above, wherein a disulphide reshuffling system is 
used in at least one denaturing/renaturing step and resulting 

25 in a ratio between the relative amount of reduced/reshuffled 
initially incorrectly formed disulphide bridges and the 
relative amount of reduced/reshuffled initially correctly 
formed disulphide bridges of at least 1.05. The ratio will 
preferably be higher, such as 1.1, 1.5, 2.0, 3.0, 5.0, 10, 

30 100, 1000, but even higher ratios are realistic and are thus 
especially preferred according to the invention. 

By the terms "initially incorrectly/correctly" with respect 
to the form of disulphide bridges is meant the disulphide 
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bridging topology just before the disulphide reshuffling 
system exerts its effects. 



It will be understood that the ratio has to be greater than 1 
in order to allow the net formation of correctly formed 
5 disulphide bridges in a protein sample. Normally the ratio 
should be as high as possible, but even ratios which are 
marginally above 1 will allow the net formation of correctly 
formed disulphide bridges in the method of the invention, the 
important parameter in ensuring a high yield being the number 
10 of denaturing/ renaturing cycles. Ratios just above one 

require that many cycles are completed before a substantive 
yield of correctly formed disulphide bridges is achieved, 
whereas high ratios only require a limited number of cycles. 

In cases where only one disulphide reshuffling system is 
15 going to be employed such a disulphide reshuffling system may 
according to the invention be selected by 



1) incubating samples of folded and misfolded protein of the 
same amino acid sequence as the protein to be processed 
in the method of the invention with an array of 

20 disulphide reshuffling systems at several different 

concentrations of a chosen denaturing agent, 

2) assessing at each of the different concentrations of 
denaturing agent the ability of each of the disulphide 
reshuffling systems to reduce and/or reshuffle initially 

25 incorrectly formed disulphide bridges without substan- 

tially reducing and/or reshuffling initially correctly 
formed disulphide bridges as assessed by calculating the 
ratio between the relative amount of reduced/reshuffled 
initially incorrectly formed disulphide bridges and the 

30 relative amount of reduced/reshuffled initially correctly 

formed disulphide bridges, and 



3) 



selecting as the disulphide reshuffling system X, the 
disulphide reshuffling system which exhibit the capa- 
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bility of reducing initially incorrectly formed 
disulphide bridges without substantially reducing and/or 
reshuffling initially correctly formed disulphide bridges 
in the widest range of concentrations of the chosen 
5 denaturing agent. 



Alternatively more than one disulphide reshuffling system may 
be employed, for instance in different cycles in the cyclic 
refolding method of the invention, but also simultaneously in 
the same cycles. This will e.g. be the case when it is likely 
10 or has been established by e.g. the method outlined above 
that the overall yield of correctly folded protein with 
correct disulphide bridging topology will be higher if using 
different disulphide reshuffling systems in the method of the 
invention. 



15 In order to calculate the above -indicated the ratio between 
the relative amount of reduced/reshuffled initially incor- 
rectly formed disulphide bridges and the relative amount of 
reduced/reshuffled initially correctly formed disulphide 
bridges, the following method may be employed: to the initial 

20 mixture of reactants in step 1) is added a known amount of 
radioactively labelled correctly folded protein. When the 
amounts of correctly and incorrectly folded protein are 
assessed in step 2) (for instance by non- reducing SDS-PAGE) 
the content of radioactivity in the correctly folded protein 

25 fraction is determined as well. Thereby an assessment of the 
now incorrectly folded (but initially correctly folded) 
protein can be determined in parallel with the determination 
of the total distribution of correctly/incorrectly folded 
protein. The above-mentioned ratio can thus be calculated as 
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wherein C x and C 2 are the initial and the final amounts of 
correctly folded proteins, respectively, U x is the amount of 
initially incorrectly folded protein, and A x and A 2 are the 
radioactivity in the initial correctly folded protein frac- 
5 tion and in the final correctly folded protein, respectively. 

In addition to the denaturing means mentioned above, dena- 
turing may also be achieved or enhanced by decreasing pH of 
the liquid phase, or by increasing pH of the liquid phase. 

The polarity of the liquid phase used in the renaturing may 
10 according to the invention have been modified by the addition 
of a salt, a polymer and/or a hydrofluoro compound such as 
trif luoroethanol . 

According to the invention, the denaturing and renaturing of 
the polypeptide molecules may also be accomplished by direct 
15 changes in physical parameters to which the polypeptide 

molecules are exposed, such as temperature or pressure, or 
these measures may be utilized to enhance or moderate the 
denaturing or renaturing resulting from the other measures 
mentioned above. 

20 However, it will be understood that a most important prac- 
tical embodiment of the method is performed by accomplishing 
chemical changes in the liquid phase by changing between a 
denaturing solution B and a renaturing solution A. In this 
case, the concentration of one or more denaturing compounds 

25 in B will often be adjusted after each cycle, and as one 

important example, the concentration of one or more denatu- 
ring compounds in B will be decremented after each cycle, but 
in another important embodiment, the concentration of one or 
more denaturing compounds in medium B is kept constant in 

30 each cycle. 

This embodiment of the invention, wherein the concentration 
of denaturing compound (s) medium B is kept constant, is 
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especially interesting when the most productive phase of the 
cycling process (with respect to correctly folded protein) 
has been identified, and large scale production of correctly 
folded protein is desired. As will be understood, the pre- 
5 f erred concentration (s) of denaturing compound ( s ) of medium B 
in this embodiment is the concentration (s) which has been 
established to ensure maximum productivity in the cyclic 
process according to the invention. 



The polypeptide molecules of the ensemble which is subjected 
10 to the method of the invention normally have a length of at 

least 25 amino acid residues, such as at least 30 amino acid 

residues or at least 50 amino acid residues. 

On the other hand, the polypeptide molecules of the ensemble 

normally have a length of at most 5000 amino acid residues, 
15 such as at most 2000 amino acid residues or at most 1000 or 

800 amino acid residues. 

As can be seen from example 10, the method of the invention 
has made possible the production of correctly folded diabody 
molecules (diabodies are described in Holliger et al., 1993). 



20 An important aspect of the invention therefore relates to a 
method for producing correctly folded diabody molecules, 
wherein an initial ensemble of polypeptide molecules com- 
prising unfolded and/ or misfolded polypeptides having amino 
acid sequences identical to the amino acid sequences of 

25 monomer fragments of diabody molecules is subjected to a 
series of at least two successive cycles, each of which 
comprises a sequence of 

1) at least one denaturing step involving conditions 
exerting a denaturing influence on the polypeptide mole- 
30 cules of the ensemble followed by 



2) at least one renaturing step involving conditions 
having a renaturing influence on the polypeptide mole- 
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cules having conformations resulting from the preceding 
step, 

the series of cycles being so adapted that a substantial 
fraction of the initial ensemble of polypeptide molecules is 
5 converted to a fraction of correctly folded diabody mo- 
lecules . 



Such a method for the correct folding of diabodies can be 
envisaged in any of the above-mentioned scenarios and aspects 
of the refolding method of the invention, that is, with 

10 respect to the choice of physical /chemical conditions as well 
as cycling schedules. However, an important aspect of the 
method for correct folding of diabodies is a method as the 
above- identified, wherein the polypeptide molecules are in 
contact with a liquid phase containing at least one 

15 disulphide reshuffling system in at least one denaturing or 
renaturing step. The preferred denaturing agent to be used in 
such a liquid phase is urea, and the preferred disulphide 
reshuffling system comprises glutathione as the main reducing 
agent . 

20 A particular aspect of the invention relates to a polypeptide 
which is a proenzyme of a serine protease, but is different 
from any naturally occurring serine protease and, in particu- 
lar, has an amino acid sequence different from that of bovine 
coagulation factor X (Protein Identification Resource (PIR) , . 

25 National Biomedical Research Foundation, Georgetown Univer- 
sity, Medical Center, U.S.A., entry: Pl;EXBO) and which can 
be proteolytically activated to generate the active serine 
protease by incubation of a solution of the polypeptide in a 
non- denaturing buffer with a substance that cleaves the 

30 polypeptide to liberate a new N- terminal residue, 



the substrate specificity of the serine protease being 
identical to or better than that of bovine blood coagu- 
lation factor X a , as assessed by each of the ratios 
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(k(I)/k(V) and k(III)/k(V) between cleavage rate against 
each of the substrates I and III: 

I : Benzoyl - Val - Gly - Arg - parani t r oani 1 ide , 

III : Tosyl-Gly-Pro-Arg-paranitroanilide, 

5 versus that against the substrate 

V : Benzoyl - He - Glu - Gly - Arg - parani troanil ide 

at 2G°C, pH=8 in a buffer consisting of 50 mM Tris, 100 
mM NaCI, 1 mM CaCl 2 , being identical to or lower than the 
corresponding ratio determined for bovine coagulation 
10 factor X a which is substantially free from contaminating 

proteases . 

The characterization of the above -identified new polypeptides 
as serine proteases is in accordance with the normal nomen- 
clatural use of the term serine proteases. As is well known 
in the art, serine proteases are enzymes which are believed 
to have a catalytic system consisting of an active site 
serine which is aligned with a histidine residue, and it is 
believed that the activation of the enzymes from the corre- 
sponding proenzymes is based on the liberation of a new N- 
terminal residue, the a-amino group of which is capable of 
repositioning within the polypeptide structure to form a salt 
bridge to an aspartic acid residue preceding an active-site 
serine residue, thereby forming the catalytic site characte- 
ristic of serine proteases. 

25 The "artificial" serine proteases defined above are extremely 
valuable polypeptide cleaving tools for use in the method of 
the invention and in other methods where it is decisive to 
have a cleaving tool which will selectively cleave proteins, 
even large folded proteins. Analogously to bovine coagulation 

30 factor X a , the above-defined artificial serine proteases in 
activated form are capable of selectively recognizing the 
cleaving- directing polypeptide segment SEQ ID NO: 38, but in 
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contrast to bovine coagulation factor X a# they can be estab- 
lished with such amino acid sequences that they can be rea- 
dily produced using recombinant DNA techniques. Thus, the 
preferred artificial serine proteases of the invention are 
5 ones which have amino acid sequences allowing their synthesis 
by recombinant DNA techniques, in particular in a prokaryote 
cells such as E. coli. As will appear from the following 
discussion and the examples, the artificial serine proteases 
of the invention, when produced in a prokaryote, may be given 
10 an enzymatically active conformation, in which the catalyti- 
cally active domains are suitably exposed, by cycling accor- 
ding to the method of the present invention. 

The quantitative test for selectivity of the artificial 
serine proteases involves determination of the cleavage rate, 
15 k, determined as the initial slope of a curve of absorption 
of light at 405 nm (absorption maximum of free paranitroani- 
line) versus time at 20 °C. 

Expressed quantitatively, the selectivity of the artificial 
serine proteases should be characterized by the value of 
20 (k(I)/k(V) being at most 0.06, and the value k(III)/k(V) 

being at most 0.5. It is preferred that (k(I)/k(V) is at most 
0.05 and k(III)/k(V) is at most 0.4, and more preferred that 
(k(I)/k(V) is at most 0.04 and k(III)/k(V) is at most 0.15. 

A more comprehensive specificity characterization involves 
25 further model substrates: thus, the substrate specificity 
could be assessed to be identical to or better than that of 
bovine blood coagulation factor X a by each of the ratios 



(k(I)/k(V), k(II)/k(V), k(III)/k(V) and k(IV)/k(V)) between 
cleavage rate against each of the substrates I -IV: 



II: 



I: 



IV: 



III: 



Benzoyl - Val -Gly- Arg-paranitroanilide , 
Tosy 1 - Gly - Pro - Ly s - parani t roanil ide , 
Tosyl - Gly - Pro - Arg -parani t roanil ide , 
( d , 1 ) Val - Leu - Arg - parani t roanil ide 
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versus that against the substrate 

V: Benzoyl -Ile-Glu-Gly-Arg-paranitroanilide 

at 20 °C, pH=8 in a buffer consisting of 50 mM Tris, 100 
mM NaCl f 1 mM CaCl 2 , being identical to or lower than the 
5 corresponding ratio determined for bovine coagulation 

factor X a which is substantially free from contaminating 
proteases . 

Within this characterization, (k(I)/k(V) should be at most 
0.06, k(II)/k(V) should be at most 0.03, k(III)/k(V) should 

10 be at most 0,5, and k(IV)/k(V)) should be at most 0.01, and 
it is preferred that (k(I)/k(V) is at most 0.05, k(II)/k(V) 
is at most 0.025, k(III)/k(V) is at most 0.4, and k(IV)/k(V)) 
is at most 0.008, and more preferred that (k(I)/k(V) is at 
most 0.04, k(II)/k(V) is at most 0.015, k(III)/k(V) is at 

15 most 0.15, and k(IV)/k(V)) is at most 0.005. 

The serine protease type polypeptide as defined above will 
normally have a molecular weight, Mj., of at most 70,000 and 
at least 15,000. 

One such novel polypeptide according to the invention has the 
20 amino acid sequence SEQ ID NO: 2 or is an analogue and/or 
homologue thereof. Other important embodiments of the 
polypeptide of the invention have an amino acid sequence 
which is a subsequence of SEQ ID NO: 2 or an analogue and/or 
homologue of such a subsequence. 

25 By the use of the term "an analogue of a polypeptide encoded 
by the DNA sequence" or "an analogue of a polypeptide having 
the amino acid sequence" is meant any polypeptide which is 
capable of performing as bovine coagulation factor X a in the 
tests mentioned above. Thus, included are also polypeptides 

30 from different sources, such as different mammals or ver- 
tebrates, which vary e.g. to a certain extent in the amino 
acid composition, or the post-translational modifications 
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e.g. glycosylation or phosphorylation, as compared to the 
artificial serine protease described in the examples. 

The term "analogue" is thus used in the present context to 
indicate a protein or polypeptide of a similar amino acid 
5 composition or sequence as the characteristic amino acid 
sequence SEQ ID NO: 2 derived from a artificial serine 
protease as described in Example 5, allowing for minor vari- 
ations that alter the amino acid sequence e.g. deletions, 
site directed mutations, insertions of extra amino acids, or 
10 combinations thereof, to generate artificial serine protease 
analogues. 



Therefore, in the present description and claims, an analogue 
(of a polypeptide) designates a variation of the polypeptide 
in which one or several amino acids may have been deleted or 
15 exchanged, and/or amino acids may have been introduced, 

provided the enzymatic activity with the above-defined speci- 
ficity is retained, as can be assessed as described above. 

With respect to homology, an analogue of a polypeptide accor- 
ding to the invention may have a sequence homology at the 
20 polypeptide level of at least 60% identity compared to the 
sequence of a fragment of SEQ ID NO: 2, allowing for dele- 
tions and/or insertions of at most 50 amino acid residues. 

Such polypeptide sequences or analogues thereof which has a 
homology of at least 60% with the polypeptide shown in SEQ ID 
25 NO: 2 encoded for by the DNA sequence of the invention SEQ ID 
NO: 1 or analogues and/or homologues thereof, constitute an 
important embodiment of this invention. 



By the term "sequence homology" is meant the identity in 
sequence of either the amino acids in segments of two or more 
30 amino acids in a amino acid sequence, or the nucleotides in 
segments of two or more nucleotides in a nucleotide sequence. 
With respect to polypeptides, the terms are thus intended to 
mean a homology between the amino acids in question between 
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which the homology is to be established, in the match with 
respect to identity and position of the amino acids of the 
polypeptides . 

The term "homologous" is thus used here to illustrate the 
5 degree of identity between the amino acid sequence of a given 
polypeptide and the amino acid sequence shown in SEQ ID NO: 
2 . The amino acid sequence to be compared with the amino acid 
sequence shown in SEQ ID NO: 2 may be deduced from a 
nucleotide sequence such as a DNA or RNA sequence, e.g. 
10 obtained by hybridization as defined in the following, or may 
be obtained by conventional amino acid sequencing methods. 

Another embodiment relates to a polypeptide having an amino 
acid sequence from which a consecutive string of 20 amino 
acids is homologous to a degree of at least 40% with a string 
15 of amino acids of the same length selected from the amino 
acid sequence shown in SEQ ID NO: 2. 

One serine protease polypeptide according to the invention 
has the amino acid sequence of SEQ ID NO: 2, residues 82-484, 
or is an analogue and/or homologue thereof. Another serine 
20 protease polypeptide according to the invention has the amino 
acid sequence of SEQ ID NO: 2, residues 166-484, or is an 
analogue and/or homologue thereof. 

A number of modifications of the sequences shown herein are 
particularly interesting: The insertion of the cleaving 

25 directing sequences SEQ ID NO: 38 or 40-42 instead of resi- 
dues 230-233 in SEQ ID NO: 2, combined with exchange of 
cysteine residue 245 by preferably Gly, Ser or Arg in SEQ ID 
NO: 2. Another interesting possibility is insertion of SEQ ID 
NO: 38 or 40-42 instead of residues 179-182 in SEQ ID NO: 2. 

30 Quite generally, in any of the artificial serine proteases 
defined above, replacement of the cleaving sequence corre- 
sponding to residues 230-233 in SEQ ID NO: 2 with one of the 
cleavage-directing sequences defined above will give rise to 
extremely useful cleaving enzymes for use in the method 
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according to the invention, in that these can be selectively 
and very efficiently cleaved by enzymes having the specific 
enzymatic activity of bovine coagulation factor X a , and thus 
by artificial serine proteases as defined above, including by 
5 molecules identical to themselves. The latter fact means that 
artificial serine proteases modified by such insertion of the 
specific cleaving-directing sequences can be extremely effec- 
tively activated, as the first molecules cleaved and acti- 
vated will be able to cleave other molecules, thus starting a 
10 chain reaction. 

As mentioned above, it is a most important feature that the 
artificial serine proteases can be produced by recombinant 
DNA techniques, and hence, another important embodiment of 
the invention relates to a nucleic acid fragment capable of 
15 encoding a polypeptide according as defined above, in par- 
ticular a DNA fragment which is capable of encoding an arti- 
ficial serine protease polypeptide as defined above. 

In one of its aspects, the invention relates to a nucleotide 
sequence encoding a polypeptide of the invention as defined 

20 above. In particular, the invention relates to a nucleotide 
sequence having the nucleotide sequence shown in the DNA 
sequence SEQ ID NO: 1 or an analogue thereof which 
has a homology with the any of the DNA sequences shown in SEQ 
ID NO: 1 of at least 60%, and/or encodes a polypeptide, the 

25 amino acid sequence of which is at least 60% homologous with 
the amino acid sequences shown in SEQ ID NO: 2. 

Generally, only coding regions are used when comparing 
nucleotide sequences in order to determine their internal 
homology. 

30 The term "analogue" with regard to the DNA fragments of the 
invention is intended to indicate a nucleotide sequence which 
encodes a polypeptide identical or substantially identical to 
the polypeptide encoded by a DNA fragment of the invention. 
It is well known that the same amino acid may be encoded by 
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various codons, the codon usage being related, inter alia, to 
the preference of the organisms in question expressing the 
nucleotide sequence* Thus, one or more nucleotides or codons 
of the DNA fragment of the invention may be exchanged by 
5 others which, when expressed, result in a polypeptide iden- 
tical or substantially identical to the polypeptide encoded 
by the DNA fragment in question. 

Furthermore, the term "analogue" is intended to allow for 
variations in the sequence such as substitution, insertion 
10 (including introns) , addition and rearrangement of one or 
more nucleotides, which variations do not have any substan- 
tially effect on the polypeptide encoded by the DNA fragment. 

Thus, within the scope of the present invention is a modified 
nucleotide sequence which differs from the DNA sequence shown 
15 in SEQ ID NO: 1 in that at least one nucleotide has been 
substituted, added, inserted, deleted and/or rearranged. 

The term "substitution" is intended to mean the replacement 
of one or more nucleotides in the full nucleotide sequence 
with one or more different nucleotides, "addition" is under - 

20 stood to mean the addition of one or more nucleotides at 
either end of the full nucleotide sequence, "insertion" is 
intended to mean the introduction of one or more nucleotides 
within the full nucleotide sequence, "deletion" is intended 
to indicate that one or more nucleotides have been deleted 

25 from the full nucleotide sequence whether at either end of 
the sequence or at any suitable point within it, and "re- 
arrangement" is intended to mean that two or more nucleotide 
residues have been exchanged within the DNA or polypeptide 
sequence, respectively. The DNA fragment may, however, also 

30 be modified by mutagenesis either before or after inserting 
it in the organism. The DNA or protein sequence of the inven- 
tion may be modified in such a way that it does not lose any 
of its biophysical, biochemical or biological properties, or 
part of such properties (one and/or all) or all of such 

35 properties (one and/or all) . 
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An example of a specific analogue of the DNA sequence of the 
invention is a DNA sequence which comprises the DNA sequence 
shown in SEQ ID NO: 1 and particularly adapted for expression 
in E . coli. This DNA sequence is one which, when inserted in 
5 E. coli together with suitable regulatory sequences, results 
in the expression of a polypeptide having substantially the 
amino acid sequence shown in SEQ ID NO: 2. Thus, this DNA 
sequence comprises specific codons recognized by E. coli. 

The terms "fragment", "sequence", "homologue" and "analogue", 
10 as used in the present specification and claims with respect 
to fragments, sequences, homologues and analogues according 
to the invention should of course be understood as not com- 
prising these phenomena in their natural environment, but 
rather, e.g., in isolated, purified, in vitro or recombinant 
15 form. 

One embodiment of the nucleic acid fragment according to the 
invention is a nucleic acid fragment as defined above in 
which at least 60% of the coding triplets encode the same 
amino acids as a nucleic acid fragment of the nucleic acid 

20 which encodes bovine coagulation factor X, allowing for 

insertions and/or deletions of at most 150 nucleotides. An 
example of such a nucleic acid fragment is SEQ ID NO: 1, 
nucleotides 76-1527, and analogues and/or homologues thereof. 
Another example is SEQ ID NO: 1, nucleotides 319-1527, and 

25 analogues and/or homologues thereof. Still another example is 
SEQ ID NO: 1, nucleotides 571-1527, and analogues and/or 
homologues thereof. 

The DNA fragment described above and constituting an impor- 
tant aspect of the invention may be obtained directly from 

30 the genomic DNA or by isolating mRNA and converting it into 
the corresponding DNA sequence by using reverse transcrip- 
tase, thereby producing a cDNA. When obtaining the DNA frag- 
ment from genomic DNA, it is derived directly by screening 
for genomic sequences as is well known for the person skilled 

35 in the art. It can be accomplished by hybridization to a DNA 
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probe designed on the basis of knowledge of the sequences of 
the invention, or the sequence information obtained by amino 
acid sequencing of a purified serine protease. When the DNA 
is of complementary DNA (cDNA) origin, it may be obtained by 
5 preparing a cDNA library with mRNA from cells containing an 
artificial serine protease. Hybridization can be accomplished 
by a DNA probe designed on the basis of knowledge of the cDNA 
sequence, or the sequence information obtained by amino acid 
sequencing of a purified artificial serine protease, 

10 The DNA fragment of the invention or an analogue and/or 
homologue thereof of the invention can be replicated by 
fusing it with a vector and inserting the complex into a 
suitable microorganism or a mammalian cell line. Alterna- 
tively, the DNA fragment can be manufactured using chemical 

15 synthesis. Also, polymerase chain reaction (PCR) primers can 
be synthesized based on the nucleotide sequence shown in SEQ 
ID NO: 1. These primers can then be used to amplify the whole 
or a part of a sequence encoding an artificial serine 
protease polypeptide. 

20 Suitable polypeptides of the invention can be produced using 
recombinant DNA technology. More specifically, the polypep- 
tides may be produced by a method which comprises culturing 
or breeding an organism carrying the DNA sequence shown in 
SEQ ID NO: 1 or an analogue and/or homologue thereof of the 

25 invention under conditions leading to expression of said DNA 
fragment, and subsequently recovering the expressed polypep- 
tide from the said organism. 

The organism which is used for the production of the polypep- 
tide may be a higher organism, e.g. an animal, or a lower 

30 organism, e.g. a microorganism. Irrespective of the type of 
organism used, the DNA fragment of the invention (described 
above) should be introduced in the organism either directly 
or with the help of a suitable vector. Alternatively, the 
polypeptides may be produced in the mammalian cell lines by 

35 introducing the DNA fragment or an analogue and/or homologue 
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thereof of the invention either directly or with the help of 
an expression vector. 

The DNA fragment of the invention can also be cloned in a 
suitable stable expression vector and then put into a suit- 
5 able cell line. The cells expressing the desired polypeptides 
are then selected using the conditions suitable for the 
vector and the cell line used. The selected cells are then 
grown further and form a very important and continuous source 
of the desired polypeptides. 

10 Thus, another aspect of the invention relates to an expres- 
sion system comprising a nucleic acid fragment as defined 
above and encoding an artificial serine protease polypeptide 
as defined above, the system comprising a 5 '-flanking 
sequence capable of mediating expression of said nucleic acid 

15 fragment. The expression system may be a replicable expres- 
sion vector carrying the nucleic acid fragment, which vector 
is capable of replicating in a host organism or a cell line; 
the vector may, e.g., be a plasmid, phage, cosmid, mini- 
chromosome or virus; the vector may be ohe which, when intro- 

20 duced in a host cell, is integrated in the host cell genome. 

Another aspect of the invention relates to an organism which 
carries and is capable of replicating the nucleic acid frag- 
ment as defined above. The organism may be a microorganism 
such as a bacterium, a yeast, a protozoan, or a cell derived 
25 from a multicellular organism such as a fungus, an insect 

cell, a plant cell, a mammalian cell or a cell line. Particu- 
larly interesting host organisms are microorganisms such as a 
bacterium of the genus Escherichia, Bacillus or Salmonella. 

A further aspect of the invention relates to a method of 
30 producing an artificial serine protease polypeptide as 
defined above, comprising the following steps of: 

1. inserting a nucleic acid fragment as defined above in 

an expression vector, 
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2. transforming a host organism as defined above with 
the vector produced in step a, 

3. culturing the host organism produced in step b to 
express the polypeptide, 



5 4. harvesting the polypeptide, 

5. optionally subjecting the polypeptide to post- 

translational modification, 



6. if necessary subjecting the polypeptide to the dena- 

turing/renaturing cycling method according to the 
10 present invention, and 



7. optionally subjecting the polypeptide to further 

modification to obtain an authentic polypeptide as 
defined above. 



Further modifications of the polypeptides may for instance be 

15 accomplished by subjecting the polypeptide molecules to 

carboxypeptidase A or B, whereby selected amino acid residues 
may be removed from the C- terminus of the polypeptide mole- 
cules. This is desirable under circumstances wherein the 
optimal folding of the authentic polypeptide molecules only 

20 is achieved when the N- terminus is free and the cleavage 

directing polypeptide (such as SEQ ID NO: 37) thus is placed 
C- terminally of the authentic polypeptide. As is known, 
carboxypeptidase B cleaves sequentially from the C- terminus, 
and only cleaves off basic amino acids, whereas carboxypepti- 

25 dase A cleaves off non-basic amino acids. By careful design- 
ing which residue is adjoined C- terminally to the authentic 
polypeptide it is possible to ensure that all but the authen- 
tic polypeptide is cleaved by the carboxypeptidases. If the 
C- terminus of the authentic polypeptide is a basic amino acid 

30 residue one should assure that the C- terminally linked resi- 
due which is to be removed is non-basic and vice versa. If 
one knows the sequence of the amino acid residues from the C- 
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terminus to the C- terminus of the authentic polypeptide it is 
possible to alternate between treatments with the two carbo- 
xypeptidases until only the naked, authentic polypeptide is 
left. A practical embodiment would be to use immobilized 
5 carboxypeptidases. 

The polypeptide produced may be isolated by a method com- 
prising one or more steps like affinity chromatography using 
immobilized polypeptide or antibodies reactive with said 
polypeptide and/or other chromatographic and electrophoretic 
10 procedures . 

Also, it will be understood that a polypeptide of the inven- 
tion may be prepared by the well known methods of liquid or 
solid phase peptide synthesis utilizing the successive coup- 
ling of the individual amino acids of the polypeptide 
15 sequence. Alternatively, the polypeptide can be synthesized 
by the coupling of individual amino acids forming fragments 
of the polypeptide sequence which are later coupled so as to 
result in the desired polypeptide. These methods thus consti- 
tute another interesting aspect of the invention. 

20 The invention also relates to the use of an artificial serine 
protease polypeptide as defined above for cleaving 
polypeptides at the cleavage site for bovine coagulation 
factor X a , the cleavage site having the amino acid sequence 
selected from the group consisting of SEQ ID NO: 38 , SEQ ID 

25 NO: 40, SEQ ID NO: 41 and SEQ ID NO: 42, and to the use of a 
an artificial serine protease polypeptide as defined above 
for cleaving polypeptides at the cleavage site for bovine 
coagulation factor X a , the cleavage site having a modified 
version of the amino acid sequence selected from the group of 

30 SEQ ID NO: 43, SEQ ID NO: 44, SEQ ID NO: 45 and SEQ ID NO: 

46, which has been converted to a cleavable form as described 
further above. 
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LEGENDS TO FIGURES 

Fig. 1: Schematic representation of segment of a cyclic 
denaturation / renaturation time -programme. 

Solvent composition is expressed in terms of a binary mixture 
5 of a non- denaturing 'buffer A' and a denaturing 'buffer B' in 
terms of relative content of buffer B. Three consecutive 
cycles are represented, each consisting of a renaturation 
phase 'F' and a denaturation phase 'D' . Changes in level of 
denaturing power of the solvent mixture during denaturation 
10 phases in consecutive cycles are denoted 'k'. 

Fig. 2: Construction of the expression plasmids pT 7 H 6 FX-h02m 
and pT 7 H 6 FX-m02m. 

The amplified DNA fragments containing the reading frames of 
human- and murine /^-microglobulin from amino acid residues 

15 lie! to Met 99 , fused at the 5' -end to the nucleotide 

sequences encoding the FX a cleavage site (SEQ ID NO: 37) , 
were cut with the restriction endonucleases Bam HI and Hind 
III (purchased from Boehringer, Germany) and ligated with T 4 
DNA ligase (purchased from Boehringer, Germany) into Bam HI 

20 and Hind III cut pT 7 H 6 using standard procedures. 

Fig. 3: Amino acid sequences of human- and murine /^-micro- 
globulin. 

A: Predicted amino acid sequence of the full length reading 
frame encoding human ^-microglobulin (SEQ ID NO: 49) . Amino 
25 acid residue one (lie) in the processed mature protein is 
indicated. B: Predicted amino acid sequence of the full 
length reading frame encoding murine 0 2 -microglobulin (SEQ ID 
NO: 50) . Amino acid residue one (lie) in the processed mature 
protein is indicated. 

30 Fig. 4: Construction of the expression plasmid pT 7 H 6 FX-hGH. 
The amplified DNA fragment containing the reading frame of 
human Growth Hormone from amino acid residues Phe x to Phe 191 , 
fused at the 5' -end to the nucleotide sequence encoding the 
FX a cleavage site IEGR (SEQ ID NO: 38) , was cut with the 
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restriction endonucl eases Bam HI and Hind III (purchased from 
Boehringer, Germany) and ligated with T 4 DNA ligase (pur- 
chased from Boehringer, Germany) into Bam HI and Hind III cut 
pT 7 H 6 using standard procedures. 

5 Fig. 5: Amino acid sequence of human Growth Hormone (So- 
matotropin) . 

The predicted amino acid sequence of the full length reading 
frame encoding human Growth Hormone (SEQ ID NO: 51) . The 
first Amino acid residue in the processed mature protein 
10 (Phe^ is indicated. 

Fig. 6: Construction of the plasmids pT 7 H 6 FX-#l, #2, and #3 
expressing amino acid residue no. 20 (Ala) to 109 (Arg) , 
amino acid residue no 20 (Ala) to 190 (Ala) , and amino acid 
residue no. 20 (Ala) to 521 (Lys) of the human a 2 -Macroglobu- 
lin Receptor Protein (a 2 MR) (SEQ ID NO: 52) . 

The amplif ied DNA fragments derived from the reading frame, of 
the a 2 MR from #1: amino acid residue no. 20 (Ala) to 109 
(Arg) , #2: amino acid residue no. 20 (Ala) to 190 (Ala), and 
#3: amino acid residue no. 20 (Ala) to 521 (Lys) , fused at 
the 5' -end to the nucleotide sequence encoding the FX a clea- 
vage site IEGR (SEQ ID NO: 38) , were cut with the restriction 
endonucleases Bam HI and Hind III (purchased from Boehringer, 
Germany) and ligated with T 4 DNA ligase (purchased from 
Boehringer, Germany) into Bam HI and Hind III cut pT 7 H 6 using 
standard procedures. 

Fig. 7: Construction of the plasmids pLcIIMLCH 6 FX-#4, #5, and 
#6 expressing amino acid residue no. 803 (Gly) to 1265 (Asp) , 
amino acid residue no. 849 (Val) to 1184 (Gin) , and amino 
acid residue no. 1184 (Gin) to 1582 (Lys) of the human oi 2 ~ 
30 Macroglobulin Receptor Protein (o? 2 MR) (SEQ ID NO: 52) . 

The amplified DNA fragments derived from the reading frame of 
the a 2 MR from #4: amino acid residue no. 803 (Gly) to 1265 
(Asp), #5: amino acid residue no. 849 (Val) to 1184 (Gin), 
and #6: amino acid residue no. 1184 (Gin) to 1582 (Lys), 
35 fused at the 5' -end to the nucleotide sequence encoding the 
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FX a cleavage site IEGR (SEQ ID NO: 38) , were cut with the 
restriction endonuc leases Bam HI or Bel and Hind III (pur- 
chased from Boehringer, Germany) and ligated with T 4 DNA 
ligase (purchased from Boehringer, Germany) into Bam HI and 
5 Hind III cut pLcIIMLCHgFX using standard procedures. 

Fig. 8: Construction of the plasmids pLcIIMLCHgFX- #7 , #8, and 
#9 expressing amino acid residue no. 803 (Gly) to 1582 (Lys) , 
amino acid residue no- 2519 (Ala) to 2941 (lie) , and amino 
acid residue no. 3331 (Val) to 3778 (lie) of the human a 2 - 
Macroglobulin Receptor Protein (a 2 MR) (SEQ ID NO: 52) . 
The amplified DNA fragments derived from the reading frame of 
the a 2 MR from #7: amino acid residue no. 803 (Gly) to 1582 
(Lys), #8: amino acid residue no. 2519 (Ala) to 2941 (lie), 
and #9: amino acid residue no. 3331 (Val) to 3778 (lie), 
fused at the 5' -end to the nucleotide sequence encoding the 
FX a cleavage site IEGR (SEQ ID NO: 38), were cut with the 
restriction endonucleases Bam HI and Hind III (purchased from 
Boehringer, Germany) and ligated with T 4 DNA ligase (pur- 
chased from Boehringer, Germany) into Bam HI and Hind III cut 
pLcIIMLCHgFX using standard procedures. 

Figs. 9a and 9b: Amino acid sequence of human a 2 -Macroglobu- 
lin Receptor Protein (a 2 MR) (SEQ ID NO: 52) . 
The predicted amino acid sequence of the full length reading 
frame encoding the a 2 MR. Amino acid residues present in the 
25 recombinant proteins as N- or C- terminal residues are iden- 
tified by their numbers above the a 2 MR sequence. 

Fig. 10: Construction of the expression plasmid pLcIIMLCHgFX- 
FXAy. 

The amplified DNA fragment containing the reading frame of 
30 bovine blood coagulation Factor X from amino acid residue 

Ser 82 to Trp 484 , (FXA7) fused at the 5' -end to the nucleotide 
sequence encoding the FX a cleavage site IEGR (SEQ ID NO: 3 8), 
was cut with the restriction endonucleases Bam HI and Hind 
III (purchased from Boehringer, Germany) and ligated with T 4 
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DNA ligase (purchased from Boehringer, Germany) into Bam HI 
and Hind III cut pLcIIMLCH 6 FX using standard procedures. 

Fig. 11: Amino acid sequence of bovine blood coagulation 
Factor X (FX) . 

5 The predicted amino acid sequence of the full length reading 
frame encoding bovine FX (SEQ ID NO: 53) . The N- terminal 
amino acid residue Ser 82 and the C- terminal Trp 484 residue in 
the FXAy construct are identified. 

Fig. 12: Construction of the expression plasmid pLcIIMLCH 6 FX- 
10 Kl. 

The amplified DNA fragment containing the reading frame of 
human plasminogen kringle 1 (Kl) from amino acid residue 
Ser 82 to Glu 162 (numbering as in "Glu" -plasminogen) , fused at 
the 5' -end to the nucleotide sequence encoding the FX a clea- 
15 vage site IEGR (SEQ ID NO: 38), was cut with the restriction 
endonucleases Bam HI and Hind III (purchased from Boehringer, 
Germany) and ligated with T 4 DNA ligase (purchased from Boeh- 
ringer, Germany) into Bam HI and Hind III cut pLcIIMLCH 6 FX 
using standard procedures. 

20 Fig. 13: Construction of the expression plasmid pLcIIH 6 FX-K4 . 

The amplified DNA fragment containing the reading frame of 
human plasminogen kringle 4 (K4) from amino acid residue 
Val 354 to Ala 439 (numbering as in n Glu" -plasminogen) , fused at 
the 5' -end to the nucleotide sequence encoding the FX a clea- 
25 vage site IEGR (SEQ ID NO: 38), was cut with the restriction 
endonucleases Bam HI and Hind III (purchased from Boehringer, 
Germany) and ligated with T 4 DNA ligase (purchased from 
Boehringer, Germany) into Bam HI and Hind III cut pLcIIH 6 FX 
using standard procedures. 

30 Fig. 14: Amino acid sequence of human "Glu 11 - Plasminogen (SEQ 
ID NO: 54) . The N- and C- terminal amino acid residues in the 
Kl and K4 constructs are identified by their numbers in the 
sequence . 
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Fig. 15: SDS-PAGE analysis of production and in vitro folding 
of recombinant human /^-microglobulin. 

Lane 1: Crude protein extract before application to the 
Ni 2+ NTA- agarose column (reduced sample) . 
5 Lane 2: Column flow- through during application of the crude 
protein extract onto the Ni 2+ NTA- agarose column (reduced 
sample) 

Lane 3: Human jS 2 -microglobulin eluted from the Ni 2+ NTA-aga- 
rose column after the cyclic folding procedure by the non- 
10 denaturing elution buffer (reduced sample) . 

Lane 4: Protein markers (Pharmacia, Sweden): From top of gel; 
94 kDa, 67 kDa, 43 kDa, 30 kDa, 20.1 kDa, and 14.4 kDa 
(reduced sample) 

Lane 5: Same as lane 3 (non-reduced sample) 
15 Lane 6: Recombinant human /3 2 - microglobulin after FX a cleavage 
and final purification (non-reduced sample) . 

Fig. 16: SDS-PAGE analysis of in vitro folding of recombinant 
human Growth Hormone; hGH (Somatotropin) . 

Lane 1: Protein markers (Pharmacia, Sweden) : From top of gel; 
20 94 kDa, 67 kDa, 43 kDa, 30 kDa, 20.1 kDa, and 14.4 kDa 
(reduced sample) 

Lane 2: Human hGH eluted from the Ni 2+ NTA- agarose column 
after the cyclic folding procedure by the non- denaturing 
elution buffer (non-reduced sample) . 

25 Lane 3: Human hGH eluted from the Ni 2 +NTA- agarose column 

after the cyclic folding procedure by the denaturing elution 
buffer B from the folding procedure (non- reduced sample) . 
Lane 4-18: Fractions collected during the separation of 
monomeric hGH- fusion protein from dimer and multimer fusion 

30 proteins after the cyclic folding procedure by ion exchange 
chromatography on Q-Sepharose (Pharmacia, Sweden) . The 
monomeric protein was eluted in a peak well separated from 
the peak containing the dimer and multimer proteins (non- 
reduced samples) . 

35 Fig. 17: SDS-PAGE analysis of in vitro folding of recombinant 
kringle 1 and 4 from human plasminogen and recombinant fusion 
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protein #4 derived from human a 2 -Macroglobulin Receptor 
Protein (a? 2 MR) . 

Lane 1: Protein markers (Pharmacia, Sweden): From top of gel; 
94 kDa, 67 kDa, 43 kDa, 30 kDa, 20.1 kDa, and 14.4 kDa 
5 (reduced sample) . 

Lane 2: Crude Kl- fusion protein extract before application to 
the Ni 2+ NTA- agarose column (reduced sample) . 
Lane 3: Kl-fusion protein eluted from the Ni 2+ NTA- agarose 
column after the cyclic folding procedure by the non-denatu- 

10 ring elution buffer (reduced sample) . 

Lane 4: Same as lane 3 (non-reduced sample) . 
Lane 5: Flow- through from the lysine -agarose column during 
application of the Kl-fusion protein (non- reduced sample) . 
Lane 6: Kl-fusion protein eluted from the lysine -agarose 

15 column (non- reduced sample) . 

Lane 7: K4- fusion protein eluted from the Ni 2+ NTA- agarose 
column after the cyclic folding procedure by the non-denatu- 
ring elution buffer (reduced sample) . 
Lane 8: Same as lane 7 (non- reduced sample). 

20 Lane 9: a 2 MR#4 fusion protein eluted from the Ni 2+ NTA- agarose 
column after the cyclic folding procedure by the non- denatu- 
ring elution buffer (reduced sample) . 
Lane 10: Same as lane 9 (non-reduced sample). 

Fig. 18: Construction of the expression plasmid pT 7 H 6 FX- 
25 Qf 2 MRBDv. 

The amplified DNA fragment containing the reading frame of 
human a 2 -Macroglobulin from amino acid residues Val 1299 to 
Ala 1451 , fused at the 5' -end to the nucleotide sequence 
encoding the FX a cleavage site IEGR (SEQ ID NO: 38) , was cut 
30 with the restriction endonucleases Bam HI and Hind III (pur- 
chased from Boehringer, Germany) and ligated with T 4 DNA 
ligase (purchased from Boehringer, Germany) into Bam HI and 
Hind III cut pT 7 H 6 using standard procedures. 

Fig. 19: Amino acid sequence of the receptor-binding domain 
35 of human a 2 -Microglobulin (from residue Val 1299 to Ala 1451 ) 
(SEQ ID NO: 55) . 
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Fig. 20: Construction of the expression plasmid pT 7 H 6 FX - TETN . 
The amplified DNA fragment containing the reading frame of 
mature monomeric human Tetranectin from amino acid residues 
GlUi to Val 181/ fused at the 5' -end to the nucleotide 
5 sequence encoding the FX a cleavage site IEGR (SEQ ID NO: 38) , 
was cut with the restriction endonucleases Bam HI and Hind 
III (purchased from Boehringer, Germany) and ligated with T 4 
DNA ligase (purchased from Boehringer, Germany) into Bam HI 
and Hind III cut pT 7 H 6 using standard procedures. 

10 Fig. 21: Amino acid sequence of human monomeric Tetranectin. 
The predicted amino acid sequence of the full length reading 
frame encoding human Tetranectin (SEQ ID NO: 56) . The first 
Amino acid residue in the processed mature protein (Glu-^ is 
indicated. 

15 Fig. 22: Construction of the expression plasmid pT 7 H 6 FX-DB32 . 
The amplified DNA fragment containing the reading frame of 
the artificial diabody DB32 from amino acid residues Gl^ to 
Asn 246 , fused at the 5 '-end to the nucleotide sequence enco- 
ding the FX a cleavage site IEGR (SEQ ID NO: 38) , was cut with 

20 the restriction endonucleases Bam HI and Hind III (purchased 
from Boehringer, Germany) and ligated with T 4 DNA ligase 
(purchased from Boehringer, Germany) into Bam HI and Hind III 
cut pT 7 H 6 using standard procedures. 

; Fig. 23: Amino acid sequence of the artificial diabody DB32 
25 (SEQ ID NO: 57) . 

Fig. 24: The expression plasmid pT 7 H 6 FX-PS .4 . 
The construction of pT 7 H 6 FX-PS.4 expressing human psoriasin 
from amino acid residues Ser 2 to Gln 101 has previously been 
described (Hoffmann, 1994) . 

30 Fig. 25: Amino acid sequence of human psoriasin. 

The predicted amino acid sequence of the full length reading 
frame encoding human psoriasin (SEQ ID NO: 58) . 



WO 94/18227 



PCT/DK94/00054 



53 

Fig. 26: SDS-PAGB analysis of purification and FX a cleavage 
of recombinant Mab 32 diabody. 
a: Different stages of the purification 
Lanes 1 and 2: Crude product from folding. 
5 Lane 3: Final purified Mab 32 diabody fusion protein product 
Lane 4: Supernatant of crude folding product after 50-fold 
concentration and centrifugation. 

Lane 5: Pellet from crude folding product after 50-fold 
concentration and centrifugation. 
10 b: FX a cleavage of Mab 32 diabody fusion protein. 

Lanes 1 and 5: Final purified Mab 32 diabody fusion protein 
Lane 2: Molar ratio 1:5 FX a :Mab 32 diabody fusion protein at 
37°C for 20 hours 

Lane 3: Molar ratio 1:2 FX a :Mab 32 diabody fusion protein at 
15 37°C for 20 hours 

Lane 4: Molar ratio 1:1 FX a :Mab 32 diabody fusion protein at 
37°C for 20 hours 



Fig 27: Suitability of glutathione as reducing agent in 

cyclic refolding of human j3 2 -microglobulin fusion protein. 
20 Lane 1: Reduced sample of test no. 1. 

Lane 2: Non- reduced sample of test no.l. 

Lane 3: Non- reduced sample of test no. 2. 

Lane 4: Non-reduced sample of test no. 3. 

Lane 5: Non-reduced sample of test no. 4. 
25 Lane 6: Non- reduced sample of test no. 5. 

Lane 7: Non- reduced sample of test no. 6. 

Lane 8: Non- reduced sample of test no. 7. 

Lane 9: Non- reduced sample of test no. 8. 

Lane 10: Non- reduced sample of test no. 9. 
30 Lane 11: Non -reduced sample of test no. 10. 

Lane 12: Non- reduced sample of test no. 11. 



Fig. 28: Suitability of L-cysteine ethyl ester as reducing 
agent in cyclic refolding of human j8 2 -microglobulin fusion 
protein. 

35 Lane 1: Reduced sample of test no. 1. 

Lane 2: Non- reduced sample of test no.l. 
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Lane 3: Non- reduced sample of test no. 2. 
Lane 4: Non- reduced sample of test no. 3. 
Lane 5: Non- reduced sample of test no. 4. 
Lane 6: Non-reduced sample of test no. 5. 
5 Lane 7: Non- reduced sample of test no. 6. 
Lane 8: Non- reduced sample of test no. 7. 
Lane 9: Non-reduced sample of test no. 8. 
Lane 10: Non- reduced sample of test no. 9. 

Fig. 29: Suitability of 2-Mercaptoethanol as reducing agent 
10 in cyclic refolding of human /^-microglobulin fusion protein. 
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Fig. 30: Suitability of Mercaptosuccinic acid as reducing 
agent in cyclic refolding of human /S 2 " micro 9 lobulin fusion 



protein. 
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Fig. 31: Suitability of N-Acetyl-L- cysteine as reducing agent 
in cyclic refolding of human /^-microglobulin fusion protein. 
35 Lane 1: Reduced sample of test no. 1. 
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Lane 2: Non- reduced sample of test no.l. 

Lane 3: Non- reduced sample of test no. 2. 

Lane 4: Non- reduced sample of test no. 3. 

Lame 5: Non- reduced sample of test no. 4. 

5 Lane 6: Non- reduced sample of test no. 5. 

Lane 7: Non- reduced sample of test no. 6. 

Lane 8: Non- reduced sample of test no. 7. 

Lane 9: Non-reduced sample of test no. 8. 
Lane 10: Non- reduced sample of test no. 9. 

10 Fig. 32: SDS-PAGE analysis of cyclic refolding of human 0 2 - 
microglobulin fusion protein. 

Lane 1: Crude protein extract before application to the 
Ni 2+ NTA-agarose column (reduced sample) . 

Lane 2: 8 /il sample of soluble fraction of refolded h/S 2 m as 
15 described in EXAMPLE 1. 

Lane 3: 4 /xl sample of soluble fraction of refolded h/S 2 m as 
described in EXAMPLE 1. 

Lane 4: 2 /xl sample of soluble fraction of refolded h/3 2 m as 
described in EXAMPLE 1. 
20 Lane 5 : 8 jxl sample of insoluble fraction of refolded hj8 2 m as 
described in EXAMPLE 1. 

Lanes 6 and 7: h# 2 m final product after purification by ion 
exchange chromatography. 

Lanes 8 and 9: Refolded h£ 2 m after optimized refolding proto- 
25 col as described in EXAMPLE 13. 

Fig. 33: SDS-PAGE analysis of refolding of human /^-micro- 
globulin fusion protein by buffer step and linear gradient. 
Lane 1: Sample from soluble fraction of refolded h0 2 m, folded 
by the buffer step protocol as described in EXAMPLE 13. 
30 Lane 2 and 3: Sample of insoluble fraction of refolded hjS 2 m, 
folded by the buffer step protocol as described in EXAMPLE 
13. 

Lane 4: Protein molecular weight markers (Pharmacia, Sweden): 
From top of gel; 94 kDa, 67 kDa, 43 kDa, 30 kDa, 20.1 kDa, 
35 and 14.4 kDa (reduced sample). 
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Lane 5: Sample of soluble fraction of refolded hp 2 m ' folded 
by the linear gradient protocol as described in EXAMPLE 13 
Lane 6 and 7: Sample of insoluble fraction of refolded hj8 2 m, 
folded by the linear gradient protocol as described in 
5 EXAMPLE 13. 

Fig. 34: The general scheme of the design of the fusion 
proteins described in the examples. 

In the N- terminal end of the fusion protein is optionally 
inserted a "booster segment" enhancing the level of expres- 

10 sion of the fusion protein in the cell expressing the DNA 
encoding the fusion protein. C-terminally to this, the n 6H n 
indicates the 6 histidinyl residues which constitute an ion 
chelating site used as a "affinity handle" during purifica- 
tion and refolding of the fusion proteins. The "FX" at the C- 

15 terminal of the 6 histidinyl site is the FX a cleavage site. 
Finally, the part of the fusion protein denoted "protein" 
represents the protein which is going to be refolded accor- 
ding to the method of the invention. 

EXAMPLES 

20 Examples 1 to 11 given in this section, which are used to 
exemplify the "cyclic folding procedure", all describe the 
process of folding a recombinant cleavable hybrid protein 
(fusion protein) produced in E. coli, purified from a crude 
protein extract and subjected to folding without further 

25 purification by one general procedure. 

The nucleotide sequence encoding the recombinant protein, 
which is to be produced, is at the 5' -end fused to a nucle- 
otide sequence encoding an amino acid sequence specifying a 
FX a cleavage site (FX) , in turn linked N- terminally to a 
30 segment containing six histidinyl residues (SEQ ID NO: 47) . 
The linking of the FX a cleavage site is normally achieved 
during a Polymerase Chain Reaction, wherein the 5' -terminal 
primer comprises nucleotides encoding this sequence. The 
linking of the six histidinyl residues is normally obtained 
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by employing a vector which comprises a nucleotide fragment 
encoding SEQ ID NO: 47. The six histidinyl residues consti- 
tute a metal ion chelating site, which is utilized as affi- 
nity handle during purification of the fusion protein and 
5 subsequently as the point of contact to the solid matrix 
during the cyclic folding process. Occasionally 'booster 
segments' (e.g. a segment derived from the N- terminus of the 
XcII protein in some cases followed by a segment derived from 
myosin light chain) are inserted N- terminal to the affinity 
10 handle in order to improve the level of expression of the 
fusion protein in E. coli. 

The fusion proteins are all designed according to the same 
general scheme (cf. fig. 34). The presence of booster seg- 
ments, affinity handle and FX a cleavage site might complicate 

15 refolding of the recombinant protein of interest. Further- 
more, the cyclic folding process is initiated immediately 
after the affinity purification of the fusion protein. This 
means that fusion protein material, which have been partially 
degraded by the E. coli host is retained on the affinity 

20 matrix in addition to the full length fusion protein column. 
This degraded fusion protein may well interfere severely with 
refolding of the full-length fusion protein, thereby reducing 
the apparent efficiency of the process. The folding effi- 
ciency results reported in Examples 1 to 11 therefore cannot 

25 directly be compared to the efficiency of the process of 
refolding a purified fusion protein. 

Examples 1 to 11 describe the refolding procedure for 21 
different proteins, protein domains or domain- clusters, 
ranging from a size of 82 amino acids (Kl, Example 6) to 780 
30 amino acids (a 2 MR#7, Example 4) , and the number of disulphide 
bridges in the proteins ranges from zero (a 2 MRAP, Example 3) 
to 33 (a 2 MR#4, Example 4) and 36 (ar 2 MR#7, Example 4) . 

The efficiency of the refolding of the proteins ranges from 
15 to 95%, and the yield of active protein lies in the order 
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of 10-100 mg for refolding on a 40 ml Ni+NTA- agarose column 
(NTA denotes a substituted nitrilotriacetic acid) . 

The following tables 1-5 demonstrate the gradient profiles 
used in the examples. "Time" is given in minutes and "flow" 
5 in ml/min. 
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TABLE 3 



Step Time 


Flow 


%A 


%B 


1 0,0 


1.0 


0,0 


100,0 


2 10,0 


1,0 


o.o 


100,0 


3 40.0 


1.0 


100,0 


0.0 
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1.0 
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0.0 
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1.0 


10,0 
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1.0 
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Time 


Flow 
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EXAMPLE 1 

Production and Folding of Human and Murine fi 2 -microglobulin 

This example describes the production in E. coli of both 
human jS 2 -microglobulin and murine ^-microglobulin as FX a 
5 cleavable fusion proteins, and the purification of the recom- 
binant human and murine ^-microglobulin after FX a cleavage. 

Plasmid clones containing the full length cDNAs encoding the 
human and the murine /^-microglobulin proteins (generously 
provided by Dr. David N. Garboczi to Dr. Soren Buus) were 

10 used as templates in a Polymerase Chain Reaction (PCR) (Saiki 
et al., 1988) designed to produce cDNA fragments correspon- 
ding to the mature human (corresponding to amino acid residue 
Ile x to Met 99 ) and the mature murine (corresponding to amino 
acid residue Ile x to Met 99 ) /^-microglobulin proteins, by use 

15 of the primers SEQ ID NO: 3 and SEQ ID NO: 4 (for the human 
& 2 -microglobulin) and SEQ ID NO: 5 and SEQ ID NO: 6 (for the 
murine £ 2 -microglobulin) . The amplified coding reading frames 
were at their 5' -ends, via the PCR-reaction, linked to 
nucleotide sequences, included in SEQ ID NO: 3 and 5, enco- 

20 ding the amino acid sequence SEQ ID NO: 37, which constitute 
a cleavage site for the bovine restriction protease FX a 
(Nagai and Thogersen, 1987) . The amplified DNA fragments were 
subcloned into the E. coli expression vector pT 7 H 6 (Chri- 
stensen et al., 1991). The construction of the resulting 

25 plasmids pT 7 H 6 FX-h/J 2 m (expressing human /^-microglobulin) and 
pT 7 H 6 FX-mj5 2 m (expressing murine /J 2 -microglobulin) is outlined 
in fig- 2 and in fig. 3 is shown the amino acid sequences of 
the expressed proteins (in SEQ ID NO: 49 (human) and SEQ ID 
NO: 50 (murine) are shown the amino acid sequences encoded by 

30 the full length reading frames) . 

Human and murine ^-microglobulin were produced by growing 
and expressing the plasmids pT 7 H 6 FX-hj8 2 m and -mj8 2 m in E. coli 
BL21 cells in a medium scale (2x1 litre) as described by 
Studier and Moffat, J. Mol. Biol., 189: 113-130, 1986. Expo- 
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nentially growing cultures at 37°C were at OD 600 0.8 infected 
with bacteriophage XCE6 at a multiplicity of approximately 5. 
Cultures were grown at 37°C for another three hours before 
cells were harvested by centrifugation. Cells were lysed by 
5 osmotic shock and sonification and total cellular protein 
extracted into phenol (adjusted to pH 8 with Trisma base) . 
Protein was precipitated from the phenol phase by addition of 
2.5 volumes of ethanol and centrifugation. The protein pellet 
was dissolved in a buffer containing 6 M guanidinium chlo- 

10 ride, 50 mM Tris-HCl pH 8 and 0.1 M dithioerythriol . Follow- 
ing gel filtration on Sephadex G-25 (Pharmacia, LKB, Sweden) 
into 8 M Urea, 1 M NaCl, 50 mM Tris-HCl pH 8, 10 mM 2-mer- 
captoethanol and 3 mM methionine the crude protein prepara- 
tion was applied to Ni 2+ activated NTA-agarose columns for 

15 purification (Hochuli et al. f 1988.) of the fusion proteins, 
MGSHHHHHHGSIEGR- human and murine /3 2 -microglobulin (wherein 
MGSHHHHHHGS IEGR is SEQ ID NO: 48) respectively and subse- 
quently to undergo the cyclic folding procedure. 

All buffers prepared for liquid chromatography were degassed 
20 under- vacuum prior to addition of reductant and/or use. 

Ni 2+ activated NTA-agarose matrix (Ni 2+ NTA- agarose) is com- 
mercially available from Diagen GmbH, Germany. During the 
course of this work it was found, however, that this commer- 
cial product did not perform as well as expe*cted. Our obser- 
25 vat ions were, that the commercial Ni 2+ NTA- agarose matrix was 
easily blocked when applying the denatured and reduced total 
protein extract, that the capacity for fusion protein was 
lower than expected, and that the matrix could only be rege- 
nerated successfully a few times over. 

30 In order to improve the performance of the Ni 2+ NTA- agarose it - 
was decided to perform a carbodiimide coupling of the N-(5- 
amino-l-carboxypentyl) iminodiacetic acid metal ligand (syn- 
thesis route as described by Dobeli & Hochuli (EPO 0253 303)) 
to a more rigid agarose matrix (i.e. Sepharose CL-6B, Pharma- 

35 cia, Sweden) : 
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8 g. of N- (5 -amino- 1 - carboxypentyl ) iminodiacetic acid from 
the synthesis procedure in 50 ml was adjusted to pH 10 by 
addition of 29 g. of Na 2 C0 3 (10 H 2 0) and added to a stirred 
suspension of activated Sepharose CL-6B in 1 M Na 2 C0 3 . Reac- 
5 tion was allowed overnight. 

The Sepharose CL-6B (initially 100 ml. suspension) was acti- 
vated after removal of water by acetone with 7 g. of 1,1'- 
carbonyldiimidazol under stirring for 15 to 30 min. Upon 
activation the Sepharose CL-6B was washed with acetone fol- 
io lowed by water and 1 M Na 2 C0 3 . The NTA- agarose matrix was 
loaded into a column and "charged" with Ni 2+ by slowly pas- 
sing through 5 column volumes of a 10% NiS0 4 solution. The 
amount of Ni 2+ on the NTA- agarose matrix, prepared by this 
procedure, has been determined to 14 /xmoles per ml matrix. 
15 The Ni 2+ NTA- agarose matrix was packed in a standard class 

column for liquid chromatography (internal diameter: 2.6 cm) 
to a volume of 40 ml. After charging the Ni 2+ NTA- agarose 
column was washed with two column volumes of water, one 
column volume of 1 M Tris-HCl pH 8 and two column volumes of 
20 loading buffer before application of the crude protein 
extract . 

Upon application of the crude protein extracts on the 
Ni 2+ NTA- agarose column, the fusion proteins, MGSHHHHHHGSIEGR- 
hj8 2 m and MGSHHHHHHGS IEGR - m)3 2 m (wherein MGSHHHHHHGS IEGR is SEQ 

25 ID NO: 48) respectively, were purified from the majority of 
coli and X phage proteins by washing with one column volume 
of the loading buffer followed by 6 M guanidinium chloride, 
50 mM Tris-HCl, 10 mM 2-mercaptoethanol, and 3 mM methionine 
until the optical density (OD) at 280 nm of the column elu- 

30 ates were stable. 

The fusion proteins were refolded on the Ni 2+ NTA- agarose 
column using a gradient manager profile as described in table 
1 and 0.5 M NaCl, 50 mM Tris-HCl pH 8, and 1.2 mM/0.4 mM 
reduced/oxidized gluthatione as buffer A and 8 M urea, 0.5 M 
35 NaCl, 50 mM Tris-HCl pH 8, 3 mM methionine, and 6 mM reduced 
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gluthatione as buffer B. The reduced/oxidized gluthatione 
solution was freshly prepared as a 200 times stock solution 
by addition of 9.9 M H 2 0 2 to a stirred solution of 0.2 M 
reduced gluthatione before addition to buffer A. 

5 After completion of the cyclic folding procedure the h/J 2 m and 
m# 2 m fusion proteins were eluted from the Ni 2+ NTA- agarose 
columns with a buffer containing 0.5 M NaCI, 50 mM Tris-HCl, 
20 mM EDTA pH 8. 

Fusion protein that were aggregated and precipitated on the 
10 Ni 2+ NTA- agarose columns were eluted in buffer B. 

Approximately 75% of the fusion protein material was eluted 
by non- denaturing elution buffer (see Fig. 16, lanes 2 and 
3) . 

As judged by non -reducing SDS-PAGE analysis approximately 70 
% of the soluble h/J 2 m fusion protein material (corresponding 
to 40 mg of hj8 2 m fusion protein) appeared monomeric (see Fig. 
15, lanes 5 and 3) whereas 25 % of the m0 2 m fusion protein 
appeared monomeric (corresponding to 20 mg of mj3 2 m fusion 
protein) . The overall efficiency of the folding procedure are 
therefore approximately 50 % for the h/3 2 m fusion protein and 
less than 20% for the m/J 2 m fusion protein. 

Monomeric hj8 2 m and mjS 2 m fusion proteins were .purified from 
dimer and* higher order multimers by ion exchange .chromatog- 
raphy on S-Sepharose (Pharmacia, Sweden) : The fusion proteins 
eluted by the non denaturing elution buffer (approximately 70 
% of the fusion protein material) was gelfiltrated into a 
buffer containing 5 mM NaCI and 5 mM Tris-HCl pH 8 on Sepha- 
dex G-25 and diluted 1:1 with water before applied onto the 
S-Sepharose ion exchange columns. Fusion proteins were eluted - 
over 5 column volumes with a liner gradient from 2.5 mM NaCI, 
2.5 mM Tris-Hcl pH 8 to 100 mM NaCI, 25 mM Tris-Hcl pH 8. The 
monomeric hjS 2 m as well as mj3 2 m fusion proteins eluted in the 
very beginning of the gradient, whereas dimers and higher 
order multimers eluted later. Fractions containing the 
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monomeric fusion proteins were diluted with water and 
reloaded to the S-Sepharose columns and one- step eluted in 1 
M NaCI, 50 mM Tris-HCl pH 8. 

«• 

The monomeric rusion proteins were cleaved with the restric- 
5 tion protease FX a overnight at room temperature in a weight 
to weight ratio of approximately 200 to one. 

After cleavage the recombinant h/? 2 m and m£ 2 m proteins were 
purified from the N- terminal fusion tail, liberated from the 
cleaved fusion protein and FX a by ion exchange chromatography 

10 on Q-Sepharose columns (Pharmacia, Sweden) : Upon gelfiltra- 
tion on Sephadex G-25 into 5 mM NaCI, 5 mM Tris-HCl pH 8 and 
1:1 dilution with water, recombinant h)S 2 m and m/J 2 m were 
eluted in a linear gradient (over 5 column volumes) from 2.5 
mM NaCI, 2.5 mM Tris-HCl pH 8 to 100 mM NaCI, 25 mM Tris-HCl 

15 pH 8. Fractions containing the cleaved recombinant proteins 
were diluted with water and reloaded to the Q-Sepharose 
columns and one-step eluted in 1 M NaCI, 50 mM Tris-HCl pH 8. 
Recombinant hjS 2 m and irvS 2 m proteins were gelf iltrated into 
freshly prepared 20 mM NH 4 HC0 3 and lyophilized twice. 

20 SDS-PAGE analysis of the production of recombinant human (3 2 - 
microglobulin is presented in fig. 15. 

The yield of fully processed recombinant human /^-microglo- 
bulin produced by this procedure was 30 mg. •■ ' 

The yield of fully processed recombinant murine /8 2 -micro- 
25 globulin produced by this procedure was 10 mg. 

Comparison of recombinant human with purified natural human 
^-microglobulin 0 2 -microglobulin was kindly carried out by 
Dr. Saren Buus in two different assays: 

1. It was found that Recombinant human /^-microglobulin and 
30 natural human /^-microglobulin reacted with both a monoclo- 
nal- and a monospecific antibody with identical affinity. 
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2. Recombinant human ^-microglobulin and natural human /J 2 - 
microglobulin were in an binding inhibition experiment using 
radiolabelled ligands found to bind natural affinity purified 
heavy chain class I K d molecules with an identical affinity. 

5 Recombinant murine 0 2 ~ micr °9 lobulin was found to bind natural 
class I heavy chain molecules with an affinity 5 times lower 
than the human ^-microglobulin. This result is in good 
agreement with previous results from the literature using 
natural material. 



10 EXAMPLE 2 

Production and folding of Human Growth Hormone (Somatotropin)' 

This example describes the production in E. coli of human 
growth hormone (hGH) as a FX a cleavable fusion protein, and 
the purification of the recombinant hGH after FX a cleavage. 

15 A plasmid clone containing the cDNA encoding the hGH (gen- 
erously provided by Dr. Henrik Dalb®ge (Dalb0ge et al., 1987) 
were used as template in a Polymerase Chain Reaction (PCR) 
(Saiki et al. f 1988), using the primers SEQ ID NO: 7 and SEQ 
ID NO: 8, designed to produce a cDNA fragment corresponding 

20 to the mature hGH (corresponding to amino acid residue Phe x 
to Phe 191 ) protein. The amplified coding reading frame was at 
the 5' -end, via the PCR-reaction, linked to a nucleotide 
sequence, included in SEQ ID NO: 7, encoding the amino acid 
sequence SEQ ID NO: 37 which constitute a cleavage site for 

25 the bovine restriction protease FX a (Nagai and Thogersen, 
19 87) . The amplified DNA fragment was subcloned into the E. 
coli expression vector pT 7 H 6 (Christensen et al . , 1991) . The 
construction of the resulting plasmid pT 7 H 6 FX-hGH (expressing 
human Growth Hormone) is outlined in fig. 4 and in fig. 5 is 

30 shown the amino acid sequence of the expressed protein (in 

SEQ ID NO: 51 is shown the amino acid sequence encoded by the 
full length reading frame) . 
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Recombinant human Growth Hormone was produced by growing and 
expressing the plasmid pT 7 H 6 FX-hGH in E. coli BL21 cells in a 
medium scale (2x1 litre) as described by Studier and Mof- 
fat, J. Mol. Biol., 189: 113-130, 1^86. Exponentially growing 
5 cultures at 37°C were at OD 600 0.8 infected with 

bacteriophage XCE6 at a multiplicity of approximately 5. 
Cultures were grown at 37°C for another three hours before 
cells were harvested by centrifugation. Cells were lysed by 
osmotic shock and sonification and total cellular protein 

10 extracted into phenol (adjusted to pH 8 with Trisma base) . 

Protein was precipitated from the phenol phase by addition of 
2.5 volumes of ethanol and centrifugation. The protein pellet 
was dissolved in a buffer containing 6 M guanidinium chlo- 
ride, 50 mM Tris-HCl pH 8 and 50 mM dithioerythriol . Follow- 

15 ing gel filtration on Sephadex G-25 (Pharmacia, LKB, Sweden) 
into 8 M Urea, 1 M NaCl, 50 mM Tris-HCl pH 8, 5 mM 2-mercap- 
toethanol and 1 mM methionine the crude protein preparation 
was applied to a Ni 2+ activated NTA-agarose column (Ni 2+ NTA- 
agarose) for purification (Hochuli et al., 1988) of the 

20 fusion protein, MGSHHHHHHGS IEGR - hGH (wherein MGSHHHHHHGS IEGR 
is SEQ ID NO: 48) and subsequently to undergo the cyclic 
folding procedure. 

Preparation and "charging" of the Ni 2+ NTA- agarose column is 
described under Example 1. 

25 All buffers prepared for liquid chromatography were degassed 
under vacuum prior to addition of reductant and/or use. 

Upon application of the crude protein extract on the Ni 2+ NTA- 
agarose column, the fusion protein, MGSHHHHHHGS I EGR - hGH 
(wherein MGSHHHHHHGS IEGR is SEQ ID NO: 48) was purified from 
30 the majority of coli and X phage proteins by washing with one - 
column volume of the loading buffer followed by 6 M guanidin- 
ium chloride, 50 mM Tris-HCl, 5 mM 2-mercaptoethanol, and 1 
mM methionine until the optical density (OD) at 280 nm of the 
eluate was stable. 
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The fusion protein was refolded on the Ni 2+ NTA- agarose column 
using a gradient manager profile as described in table 2 and 
0.5 M NaCl, 50 mM Tris-HCl pH 8 # and 1.0 mM/0.1 mM 
reduced/oxidized gluthatione as buffer A and 8 M urea, 0.5 M 
5 NaCl, 50 mM Tris-HCl pH 8, 1 mM methionine, and 5 mM reduced 
gluthatione as buffer B. The reduced/oxidized gluthatione 
solution was freshly prepared as a 200 times stock solution 
by addition of 9.9 M H 2 0 2 to a stirred solution of 0.2 M 
reduced gluthatione before addition to buffer A. 

10 After completion of the cyclic folding procedure the hGH 
fusion protein was eluted from the Ni 2+ NTA- agarose column 
with a buffer containing 0.5 M NaCl, 50 mM Tris-HCl, 20 mM 
EDTA pH 8. Fusion protein that were aggregated and precipi- 
tated on the Ni 2+ NTA- agarose column was eluted in buffer B. 

15 Approximately 80% of the fusion protein material was eluted 
by the non denaturing elution buffer (see Fig. 16, lanes 2 
and 3) . As judged by non- reducing SDS-PAGE analysis 90 % of 
the soluble fusion protein material (corresponding to approx- 
imately 70 mg of fusion protein) appeared monomeric (see Fig. 

20 16, lane 2) yielding an overall efficiency of the folding 
procedure of approximately 70 %. 

Monomeric hGH fusion protein was purified from dimer and 
higher order multimers by ion exchange chromatography on Q- 
Sepharose (Pharmacia, Sweden) : After gelf iltration into a 

25 buffer containing 25 mM NaCl and 25 mM Tris-HCl pH 8 on 

Sephadex G-25 the fusion protein material, eluted by the non- 
denaturing buffer, was applied onto a Q-Sepharose ion 
exchange column. Fusion protein were eluted over 5 column 
volumes with a linear gradient from 25 mM NaCl, 25 mM Tris- 

30 HC1 pH 8 to 200 mM NaCl, 50 mM Tris-HCl pH 8. The monomeric 
hGH fusion protein eluted in the beginning of the gradient, 
whereas dimers and higher order multimers eluted later. 
Fractions containing the pure monomeric fusion protein was 
added NiS0 4 and iminodiacetic acid (IDA, adjusted pH 8 with 

35 NaOH) to 1 mM and cleaved with the restriction protease FX a 
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for 5 hours at 37°C in a weight to weight ratio of approxi- 
mately 100 to one. FX a is inhibited after cleavage by addi- 
tion of Benzamidine hydrochloride to 1 mM. 

After cleavage the recombinant hGH protein was isolated from 
5 uncleaved fusion protein and the liberated fusion tail, upon 
gelfiltration on Sephadex G-25 into 8 M Urea, 50 mM Tris-HCl 
pH 8, to remove Ni 2+ IDA and Benzamidine, by passage through a 
small Ni 2+ NTA- agarose column followed inline by a small 
Nd 3+ NTA agarose column and subsequently a non Ni 2+ activated 

10 NTA- agarose column to ensure complete removal of FX a and of 
Ni 2+ and Nd 3+ , respectively. Recombinant hGH was purified 
from a minor fraction of recombinant breakdown product by ion 
exchange chromatography on Q-Sepharose: hGH was eluted in a 
linear gradient (over 5 column volumes) from 8 M Urea, 50 mM 

15 Tris-HCl pH 8 to 8 M Urea, 250 mM NaCl, 25 mM Tris-HCl pH 8. 
Fractions containing the cleaved purified recombinant protein 
was gelfiltrated into freshly prepared 20 mM NH 4 HC0 3 and 
lyophilized twice. 

SDS-PAGE analysis of the production and folding of recombi- 
20 nant human growth hormone is presented in fig. 16. 

The yield of fully processed recombinant human growth hormone 
produced by this procedure was 10 mg. 

The recombinant human ^growth hormone produced by this pro- 
cedure co-migrated both in reducing and non- reducing SDS-PAGE 
25 and in non -denaturing PAGE analysis with biologically active 
recombinant human growth hormone generously provided by Novo- 
Nordisk A/S. 

EXAMPLE 3 

Production and folding of human ol^MRAP 

30 The plasmid used for expression in E. coli BL21 cells of the 
human a 2 -Macroglobulin Receptor Associated Protein (a 2 MRAP) , 
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' pT7H6FX-or 2 MRAP and the conditions used for production of the 
fusion protein has previously been described by us in, Nykjaer 
et al., J. Biol. Chem. 267: 14543-14546, 1992. The primers 
SEQ ID NO: 9 and SEQ ID NO: 10 were used in the PCR employed 
5 for multiplying the a 2 MRAP encoding DMA. 

Crude protein extract precipitated from the phenol phase of 
the protein extraction of cells from 2 litres of culture of 
MGSHHHHHHGSIEGR - o? 2 MRAP (wherein MGSHHHHHHGS IEGR is SEQ ID NO: 
48) expressing E. coli BL21 cells was dissolved in a buffer 

10 containing 6 M guanidinium chloride, 50 mM Tris-HCl pH 8 and 
50 mM dithioerythriol . Following gel filtration on Sephadex 
G-25 (Pharmacia, Sweden) into 8 M Urea, 0.5 M NaCl, 50 mM 
Tris-HCl pH 8, and 1 mM methionine the crude protein prepara- 
tion was applied to a Ni 2+ activated NTA-agarose matrix 

15 (Ni 2+ NTA- agarose) for purification (Hochuli et al. # 1988) of 
the fusion protein, MGSHHHHHHGSIEGR- a 2 MRAP (wherein 
MGSHHHHHHGSIEGR is SEQ ID NO: 48) and subsequently to undergo 
the cyclic folding process. 

All buffers prepared for liquid chromatography were degassed 
20 under vacuum prior to addition of reductant and/or use. 

Preparation and "charging" of the Ni 2+ NTA- agarose column is 
described under Example 1. 

Upon application of the crude protein extract on the Ni 2+ NTA- 
agarose column, the fusion protein, MGSHHHHHHGSIEGR -a 2 MRAP 
25 (wherein MGSHHHHHHGSIEGR is SEQ ID NO: 48) was purified from 
the majority of coli and X phage proteins by washing with one 
column volume of the loading buffer followed by 6 M guanidin- 
ium chloride, 50 mM Tris-HCl, and 1 mM methionine until the 
optical density (OD) at 280 nm of the eluate was stable. 

30 The fusion protein was refolded on the Ni 2+ NTA- agarose column 
using a gradient manager profile as described in table 3 and 
0.5 M NaCl, 50 mM Tris-HCl pH 8, 2 mM CaCl 2 and 1 mM 2-mer- 
captoethanol as buffer A and 6 M guanidinium chloride, 50 mM 
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Tris-HCl pH 8, 2 mM CaCl 2 and 1 mM 2-mercaptoethanol as 
buffer B. 

After completion of the cyclic folding procedure the a 2 MRAP 
fusion protein was eluted from the Ni 2+ NTA- agarose column 
5 with a buffer containing 0.5 M NaCI, 50 mM Tris-HCl, 20 mM 
EDTA pH 8. 

Virtually no fusion protein was found to be aggregated or 
precipitated on the Ni 2+ NTA- agarose column. The estimated 
yield of ^'MRI^P fusion protein was 60 mg and the efficiency 
10 of the folding procedure close to 95%. 

The fusion protein MGSHHHHHHGSIEGR-of 2 MRAP (wherein 
MGSHHHHHHGS IEGR is SEQ ID NO: 48) was cleaved with the bovine" 
restriction protease FX a overnight at room temperature in a 
weight to weight ratio of 200:1 in the elution buffer. Upon 

15 gelf iltration on Sephadex G-25 into 100 mM NaCI, 25 mM Tris- 
HCl pH 8, the protein solution was passed through a Ni 2+ NTA- 
agarose column thereby removing uncleaved fusion protein and 
the liberated fusion N- terminal tail originating from cleaved 
fusion proteins. Finally the protein solution was diluted 1:4 

20 with water and the a 2 MRAP protein purified from FX a by ion 
exchange chromatography on Q-Sepharose (Pharmacia, Sweden) . 
The Q-Sepharose column was eluted with a linear gradient over 
6 column volumes from 25 mM NaCI, 25 mM Tris-HCl pH 8 to 250 
mM NaCI, 25 mM Tris-HCl pH 8. The or 2 MRAP protein eluted in 

25 the very beginning of the linear gradient whereas FX a eluted 
later. 

The yield of af 2 MRAP protein produced and refolded by this 
procedure was 40 mg. 

The ligand binding characteristics (i.e. binding to the ot 2 - 
30 Macroglobulin Receptor and interference with the binding of 
human Urokinase Plasminogen Activator - Plasminogen Activator 
Inhibitor type -I complex to the a 2 -M Receptor) has, according 
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to Dr. Nykjasr, been found identical to the ligand binding 
characteristics of the purified natural protein. 

EXAMPLE 4 

Production and folding of domains and domain- clusters from 
5 the ot 2 -M Receptor 

The human a 2 -Macroglobulin Receptor/Low Density Lipoprotein 
Receptor-Related Protein (a 2 MR) is a 600 kDa endocytotic 
membrane receptor. a 2 -MR is synthesized as a 4524 amino acid 
single chain precursor protein. The precursor is processed 

10 into a 85 kDa transmembrane 0- chain and a 500 kDa a- chain, 
non-covalently bound to the extracellular domain of the jS- 
chain. The ar 2 -MR is known to bind Ca 2+ in a structure depen- 
dent manner (i.e. the reduced protein does not bind Ca 2+ ) and 
is believed to be multifunctional in the sense that o? 2 -MR 

15 binds ligands of different classes. 

The entire amino acid sequence of the a- chain can be repre- 
sented by clusters of three types of repeats also found in 
other membrane bound receptors and in various plasma pro- 
teins: 

20 A: This type of repeat span approximately 40. amino acid • 

residues and is characterised by the sequential appearance of 
the six cysteinyl residues contained in the repeat. Some 
authors has named this repeat complement -type domain. 

B: This type of repeat also span approximately 40 amino acid 
25 residues and is characterised by the sequential appearance of 
the six cysteinyl residues contained in the repeat. In the 
literature this repeat has been named EGF-type domains. 

C: This type of repeat span approximately 55 amino acid 
residues and is characterised by the presence of the con- 
30 sensus sequence SEQ ID NO: 39. 
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This example describes the production in E. coli of a number 
of domains and domain- clusters derived from the a 2 -MR protein 
as FX a cleavable fusion proteins and the purification, in 
vitro folding, and the FX a cleavage and processing of these 
5 recombinant proteins. 

A plasmid clone containing the full length cDNA encoding the 
human a 2 -MR protein (generously provided by Dr. Joachim Herz; 
Herz et al., EMBO J. , 7: 4119-4127, 1988) were used as tem- 
plate in a series of Polymerase Chain Reactions (PCR) 
10 designed to produce cDNA fragments corresponding to a number 
of polypeptides representing domains and domain- clusters 
derived from the a 2 -MR protein: 

#1: Contains two domains of the A- type, corresponding to 
amino acid residue 20 to 109 in the a 2 -MR protein. The 
15 primers SEQ ID NO: 11 and SEQ ID NO: 12 were used in the PCR. 

#2: Contains two domains of the A- type followed by two type-B 
domains, corresponding to amino acid residue 20 to 190 in the 
a 2 -MR protein. The primers SEQ ID NO: 11 and SEQ ID NO: 13 
were used in the PCR. 

20 #3: Identical to #2 followed by a region containing YWTD 

repeats, corresponding to amino acid residue 20 to 521. The 
primers SEQ ID NO: 11 and SEQ ID NO: 14 were used in the PCR. 

#4: Contains one type-B domain, followed by 8 type-A domains 
and finally two type-B domains, corresponding to amino acid 
25 residue 803 to 1265 in the a? 2 -MR protein. The primers SEQ ID 
NO: 15 and SEQ ID NO: 16 were used in the PCR. 

#5: Contains only the 8 type-A domains also present in #4, 
corresponding to amino acid residue 849 to 1184 in the or 2 -MR 
protein. The primers SEQ ID NO: 17 and SEQ ID NO: 18 were 
30 used in the PCR. 
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#6: Contains the two C- terminal type-B domains from #4, 
followed by 8 YWTD repeats and one type-B domain, correspon- 
ding to amino acid residue 1184 to 1582 in the a 2 -MR protein. 
The primers SEQ ID NO: 19 and SEQ ID NO: 20 were used in the 
5 PCR. 

#7: Contains the whole region included in constructs #4 to 
#6, corresponding to amino acid residue 803 to 1582 in the 
a? 2 -MR protein. The primers SEQ ID NO: 15 and SEQ ID NO: 20 
were used in the PCR. 

10 #8: Contains 10 type-A domains, corresponding to amino acid 
residue 2520 to 2941 in the a 2 -MR protein. The primers SEQ ID 
NO: 21 and SEQ ID NO: 22 were used in the PCR. 

#9: Contains 11 type-A domains, corresponding to amino acid 
residue 3331 to 3778 in the a 2 -MR protein. The primers SEQ ID 
15 NO: 23 and SEQ ID NO: 24 were used in the PCR. 

The amplified nucleotide sequences encoding the domains and 
domain- clusters were at their 5' -end, via the PCR-reaction, 
linked to nucleotide sequences (included in SEQ ID NO: 11, 
15, 17, 19, 21 and 23) encoding the amino acid sequence SEQ 

20 ID NO: 37 which constitute a cleavage site for the bovine 
restriction protease FX a (Nagai and Thogersen, Methods in 
Enzymology, 152: 461-481, 1987). The amplified DNA fragments 
were either subcloned into the E. coli expression vector 
pT 7 H 6 (Christensen et al., FEBS Letters. 295: 181-184, 1991) 

25 or the expression plasmid pLcIIMLCH 6 , which is modified from 
pLcIIMLC (Nagai et al., Nature, 332: 284-286, 1988) by the 
insertion of an oligonucleotide encoding six histidinyl 
residues C- terminal of the myosin light chain fragment. The 
construction of the resulting plasmids pT 7 H 6 FX-#l to #3 and 

30 pLcIIMLCH 6 FX-#4 to #9 is outlined in fig. 6-8 and in figure 9 
is shown the amino acid sequence of the expressed protein (in 
SEQ ID NO: 52 is shown the amino acid sequence encoded by the 
full length reading frame) . 
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The domains and domain- clusters subcloned in the pT 7 H 6 FX 
series were grown and expressed in E. coli BL21 cells in a 
medium scale (2 litre) as described by Studier, and Moffat, 
J. Mol. Biol., 189: 113-130, 1986. Exponentially growing 
5 cultures at 37°C were at OD 600 0.8 infected with bacterio- 
phage XCE6 at a multiplicity of approximately 5. Cultures 
were grown at 37°C for another three hours before cells were 
harvested by centrifugation. Cells were lysed by osmotic 
shock and sonification and total cellular protein extracted 
10 into phenol (adjusted to pH 8 with Trisma base) . 

The domain- clusters subcloned in the pLcIIMLCH 6 series were 
grown and expressed in E. coli QY13 cells as described in 
Nagai and Thegersen. Methods in Enzymology, 152: 461-481, 
1987. Exponentially growing cultures (4 litre) at 30°C were 

15 at OD 600 1.0 transferred to 42°C for 15 min. This heat shock 
induces synthesis of the fusion proteins. The cultures are 
further incubated at 37 °C for three to four hours before 
cells are harvested by centrifugation. Cells were lysed by 
osmotic shock and sonification and total cellular protein 

20 extracted into phenol (adjusted to pH 8 with Trisma base) . 

Crude protein was precipitated from the phenol phase by 
addition of 2.5 volumes of ethanol and centrifugation. The 
protein pellet was dissolved in a buffer containing 6 M 
guanidinium chloride, 50 mM Tris-HCl pH 8 and 0.1 M dithio- 

25 erythriol . Following gel filtration on Sephadex G-25 (Phar- 
macia, Sweden) into 8 M Urea, 1 M NaCl, 50 mM Tris-HCl pH 8, 
10 mM 2-mercaptoethanol and 2 mM methionine the crude protein 
preparations were applied to a Ni 2+ activated NTA-agarose 
columns for purification (Hochuli et al., 1988) of the fusion 

30 proteins and subsequently to undergo the cyclic folding 
procedure . 

All buffers prepared for liquid chromatography were degassed 
under vacuum prior to addition of reductant and/or use. 
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Preparation and "charging" of the Ni 2+ NTA- agarose column is 
described under Example 1. 



Upon application of the crude protein extracts on the 
Ni 2+ NTA- agarose column, the fusion proteins were purified 
5 from the majority of coli and X phage proteins by washing 
with one column volume of the loading buffer followed by 6 M 
guanidinium chloride, 50 mM Tris-HCl,- 10 mM 2-mercaptoetha- 
nol, and 2 mM methionine until the optical density (OD) at 
280 nm of the eluate was stable. 

10 Each of the fusion proteins were refolded on the Ni 2+ NTA- 

agarose column using a gradient manager profile as described 
in table 4 and 0.5 M NaCI, 50 mM Tris-HCl pH 8, 2 mM CaCl 2 , 
0.33 mM methionine, and 2.0 mM/0.2 mM reduced/oxidized glu- 
thatione as buffer A and 4 M urea, 0.5 M NaCI, 50 mM Tris-HCl 

15 pH 8, 2 mM CaCl 2 , 2 mM methionine, and 3 mM reduced gluthati- 
one as buffer B. The reduced/oxidized gluthatione solution 
was freshly prepared as a 100 times stock solution by addi- 
tion of 9.9 M H 2 0 2 to a stirred solution of 0.2 M reduced 
gluthatione before addition to buffer A. 



20 After completion of the cyclic folding procedure the fusion 
proteins representing domains and domain- clusters derived 
from the a 2 -MR protein were eluted from the Ni 2+ NTA- agarose 
column with a buffer containing 0.5 M NaCI, 50 mM Tris-HCl, 5 
mM EDTA pH 8. Fusion proteins that were aggregated and pre- 

25 cipitated on the Ni 2+ NTA- agarose column were eluted in buffer 
B. 



Approximately 75% of the fusion protein material expressed 
from the plasmids pT 7 H 6 FX-#l and #2, representing the N- 
terminal two and four cysteine- rich domains of the ar 2 -MR 
30 protein were eluted from the Ni 2+ NTA- agarose column by the 
non denaturing buffer. The majority of this fusion protein 
material appeared as monomeric as judged by non reducing SDS 
PAGE analysis. The yield of monomeric fusion protein #1 and 
#2 were estimated to approximately 50 mg. 
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Approximately 50% of the fusion protein material expressed 
from all other expression plasmids representing domain- clus- 
ters derived from the a 2 -MR protein were eluted from the 
Ni 2+ NTA- agarose column by the non denaturing buffer. Between 
5 30% (fusion proteins #5 and #7) and 65% (fusion protein #4) 
of these fusion proteins appeared as monomeric as judged by 
non reducing SDS-PAGE analysis (see Fig. 17, lanes 9 and 10) . 

Each fusion protein eluted by the non denaturing elution 
buffer was cleaved with the restriction protease FX a over- 
10 night at room temperature in an estimated weight to weight 
ratio of 100 to one. 

Upon gelf iltration on Sephadex G-25 into 100 mM NaCI, 25 mM 
Tris-HCl pH 8, the protein solution was passed through a 
Ni 2+ NTA- agarose column thereby removing uncleaved fusion 
15 protein and the liberated N- terminal fusion tail originating 
from the cleaved fusion proteins. FX a was removed from the 
solution by passing the recombinant protein solutions through 
a small column of SBTI-agarose (Soy Bean Trypsin Inhibitor 
immobilized on Sepharose CL-6B (Pharmacia, Sweden)). 

20 SDS-PAGE analysis of the refolded, soluble fusion protein 
product #4 is presented in fig. 17, lanes 9 and 10, showing 
reduced and unreduced samples, respectively. The mobility 
increase observed for the unreduced sample reflects the 
compactness of the polypeptide due to the presence of 33 

25 disulphide bridges. 

Each of the recombinant proteins were found to bind Ca 2+ in a 
structure dependent manner. 

It was found by Dr. Soren Moestrup that a monoclonal anti- 
body, A2MRar-5 derived from the natural human a 2 -MR, bound the 
30 recombinant proteins expressed by the constructs #4, #6, and 
#7 whereas a monospecific antibody, A2MRoi-3 derived also from 
natural ce 2 -MR, was found to bind the recombinant protein 
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expressed by construct #8 . The binding specificity of both 
antibodies is structure dependent (i.e. the antibodies do 
neither react with reduced a 2 -MR nor with reduced recombinant 
protein) 



5 EXAMPLE 5 

Production and folding- of bovine coagulation Factor X a (FX a ) 

This example describes the production in E. coli of one 
fragment derived from bovine FX a as a FX a cleavable fusion 
protein and the purification, in vitro folding, and the 
10 processing of the recombinant protein. 

The cDNA encoding bovine FX was cloned by specific amplifi- 
cation in a Polymerase Chain Reaction (PCR) of the nucleotide 
sequences encoding bovine FX from amino acid residue Ser 82 to 
Trp 484 (SEQ ID NO: 2, residues 82-484) (FXA7, amino acid 
15 numbering relates to the full coding reading frame) using ls t 
strand oligo-dT primed cDNA synthesized from total bovine 
liver RNA as template. Primers used in the PCR were SEQ ID 
NO: 25 and SEQ ID NO: 26. RNA extraction and cDNA synthesis 
were performed using standard procedures. 

20 The amplified reading frame encoding FXAy was at the 5' -end, 
via. the PCR- reaction, linked to nucleotide sequences encoding 
the amino acid sequence SEQ ID NO: 37 which constitute a 
cleavage site for the bovine restriction protease FX a (Nagai, 
and Thegersen. Methods in Enzymology, 152: 461-481, 1987). 

25 The amplified DNA fragments was cloned into the E. coli 

expression vector pLcIIMLCH 6 , which is modified from pLcIIMLC 
(Nagai et al . , Nature, 332: 284-286, 1988) by the insertion 
of an oligonucleotide encoding six histidinyl residues C- ter- 
minal of the myosin light chain fragment. The construction of 

30 the resulting plasmid pLcIIMLCH 6 FX-FXA7 is outlined in fig. 
10 and in figure 11 is shown the amino acid sequence of the 
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expressed protein (in SEQ ID NO: 53 is shown the amino acid 
sequence encoded by the full length reading frame) . 

The pLcIIMLCH 6 -FXAy plasmid was grown and expressed in E. 
coli QY13 cells as described in Nagai and Thegersen (Methods 
5 in Enzymology, 152: 461-481, 1987). Exponentially growing 
cultures at 30°C were at OD 600 1.0 incubated at 42 °C for 15 
min. This heat shock induces synthesis of the fusion pro- 
teins. The cultures are further incubated at 37°C for three 
to four hours before cells are harvested by centrifugation. 
10 Cells were lysed by osmotic shock and sonification and total 
cellular protein extracted into phenol (adjusted to pH 8 with 
Trisma base) . 

Crude protein was precipitated from the phenol phase by 
addition of 2.5 volumes of ethanol and centrifugation. The 

15 protein pellet was dissolved in a buffer containing 6 M 

guanidinium chloride, 50 mM Tris-HCl pH 8 and 0.1 M dithio- 
erythriol. Following gel filtration on Sephadex G-25 (Phar- 
macia, LKB, Sweden) into 8 M Urea, 1 M NaCl, 50 mM Tris-HCl 
pH 8, 10 mM 2-mercaptoethanol -the crude protein preparation 

20 was applied to a Ni 2+ activated NTA-agarose matrix for puri- 
fication (Hochuli et al., 1988.) of the FXAy fusion protein 
and subsequently to undergo the cyclic folding procedure. 

All buffers prepared for liquid chromatography were degassed 
under vacuum prior to addition of reductant and/or use. 

25 Preparation and "charging" of the Ni 2+ NTA- agarose column is 
described under Example 1. 

Upon application of the crude protein extracts on the 
Ni 2+ NTA- agarose column, the fusion proteins were purified 
from the majority of coli and X phage proteins by washing 
30 with one column volume of the loading buffer followed by 6 M 
guanidinium chloride, 50 mM Tris-HCl, and 10 mM 2-mercap- 
toethanol until the optical density (OD) at 280 nm of the 
eluate was stable. 
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The fusion protein was refolded on the Ni 2+ NTA- agarose column 
using a gradient manager profile as described in table 5 and 
0.5 M NaCl, 50 mM Tris-HCl pH 8, 2 mM CaCl 2 , and 2.0 mM/0.2 
mM reduced/oxidized gluthatione as buffer A and 8 M urea, 0.5 
5 M NaCl, 50 mM Tris-HCl pH 8, 2 mM CaCl 2 , and 3 mM reduced 
gluthatione as buffer B. The reduced/oxidized gluthatione 
solution was freshly prepared as a 100 times stock solution 
by addition of 9.9 M H 2 0 2 to a stirred solution of 0.2 M 
reduced gluthatione before addition to buffer A. 

10 After completion of the cyclic folding procedure the FXAy 
fusion protein was eluted from the Ni 2+ NTA- agarose column 
with a buffer containing 0.5 M NaCl, 50 mM Tris-HCl, 5 mM 
EDTA pH 8. Fusion protein that was aggregated and precipi- 
tated on the Ni 2+ NTA- agarose column was eluted in buffer B. 



15 Approximately 33% of the FXAy fusion protein material was 

eluted from the Ni 2+ NTA- agarose column by the non denaturing 
buffer. The amount of FXAy fusion protein was estimated to 15 
mg. Only about one third of this fusion protein material 
appeared as monomeric as judged by non reducing SDS-PAGE 

20 analysis corresponding to an overall efficiency of the fol- 
ding procedure of approximately 10%. 



FXAy fusion protein in non denaturing buffer was activated by 
passing the recombinant protein solution through a small 
column of trypsin- agarose (trypsin immobilized on Sepharose 
25 CL-6B (Pharmacia, Sweden)). 

The activated recombinant FXAy fusion protein was assayed for 
proteolytic activity and substrate specificity profile using 
standard procedures with chromogenic substrates- The activity 
and substrate specificity profile was indistinguishable from - 
30 that obtained for natural bovine FX a 
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EXAMPLE 6 

Production and folding of kringle domains 1 and 4 from human 
plasminogen 

This example describes the production in E. coli of the 
5 lysine binding kringle domains 1 and 4 from human plasminogen 
(Kl and K4, respectively) as FX a cleavable fusion proteins 
and the purification and in vitro folding of the Kl- and 
K4- fusion proteins. 

A plasmid clone containing the full length cDNA encoding 
10 human plasminogen cloned into the general cloning vector 
pUC18 (generously provided by Dr. Earl Davie, Seattle, USA) 
were used as template in a Polymerase Chain Reaction (PCR) 
designed to produce cDNA fragments corresponding to Kl (cor- 
responding to amino acid residue Ser 81 to Glu 162 ^ n so-called 
15 Glu- plasminogen) and K4 (corresponding to amino acid residue 
Val 354 to Ala 439 in so-called Glu -plasminogen) . The primers 
SEQ ID NO: 27 and SEQ ID NO: 28 were used in the PCR produ- 
cing Kl and the primers SEQ ID NO: 29 and SEQ ID NO: 30 were 
used in the PCR producing K4. 

20 The amplified reading frames encoding Kl and K4 were at their 
5' -ends, via the PCR-reaction, linked to nucleotide 
sequences, included in SEQ ID NO: 27 and SEQ ID NO: 29, 
encoding the amino acid sequence SEQ ID NO: 37 which con- 
stitute a cleavage site for the bovine restriction protease 

25 FX a (Nagai and Thegersen. Methods in Enzymology, 152: 

461-481, 1987) . The amplified Kl DNA fragment was cloned into 
the E. coli expression vector pLcIIMLCH 6 , which is modified 
f rom pLcIIMLC (Nagai etal., Nature, 332: 284-286, 1988) by 
the insertion of an oligonucleotide encoding six histidinyl 

30 residues C- terminal of the myosin light chain fragment. The 
construction of the resulting plasmid pLcIIMLCH 6 FX-Kl is 
outlined in fig. 12. The amplified K4 DNA fragment was cloned 
into the E. coli expression vector pLcIIH 6 , which is modified 
from pLcII (Nagai and Thogersen. Methods in Enzymology, 152: 
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461-481, 1987) by the insertion of an oligonucleotide en- 
coding six histidinyl residues C- terminal of the ell frag- 
ment. The construction of the resulting plasmid pLcIIH 6 FX-K4 
is outlined in fig. 13 and in fig. 14 is shown the amino acid 
5 sequence of human "Glu" -plasminogen (SEQ ID NO: 54). 

Both the pLcIIMLCH 6 -Kl plasmid and the pLcIIH 6 FX-K4 plasmid 
were grown and expressed in E. coli QY13 cells as described 
in Nagai and Thogersen. Methods in Enzymology, 152: 461-481, 
1987. Exponentially growing cultures at 30°C were at OD 600 

10 1.0 transferred to 42 °C for 15 min. This heat shock induces 
synthesis of the fusion proteins. The cultures are further 
incubated at 37°C for three to four hours before cells are 
harvested by centrifugation. Cells were lysed by osmotic 
shock and sonification and total cellular protein extracted 

15 , into phenol (adjusted to pH 8 with Trisma base) . 

Crude protein was precipitated from the phenol phase by 
addition of 2.5 volumes of ethanol and centrifugation. The 
protein pellet was dissolved in a buffer containing 6 M 
guanidinium chloride, 50 mM Tris-HCl pH 8 and 0.1 M dithio- 

20 erythriol. Following gel filtration on Sephadex G-25 (Phar- 
macia, Sweden) into 8 M Urea, 1 M NaCI, 50 mM Tris-HCl pH 8, 
10 mM 2-mercaptoethanol, and 2 mM methionine the crude pro- 
tein preparation was applied to a Ni 2+ activated NTA- agarose 
matrix for purification (Hochuli et al., 1988.) of the Kl- 

25 and K4- fusion proteins and subsequently to undergo the cyclic 
folding procedure. 

All buffers prepared for liquid chromatography were degassed 
under vacuum prior to addition of reductant and/or use. 

Preparation and "charging" of the Ni 2+ NTA- agarose column is 
30 described under Example 1. 

Upon application of the crude protein extracts on the 
Ni 2+ NTA- agarose column, the fusion proteins were purified 
from the majority of coli and \ phage proteins by washing 
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with one column volume of the loading buffer followed by 6 M 
guanidinium chloride, 50 mM Tris-HCl,10 mM 2-mercaptoethanol, 
and 2 mM methionine until the optical density (OD) at 280 nm 
of the column eluate was stable. 

5 The fusion protein was refolded on the Ni 2+ NTA- agarose column 
using a gradient manager profile as described in table 4 with 
0.5 M NaCI, 50 mM Tris-HCl pH 8, 10 mM 6 aminohexanoic acid 
(e-aminocapronic acid, e-ACA), 0.33 mM methionine, and 2.0 
mM/0.2 mM reduced/ oxidized gluthatione as buffer A and 4 M 

10 Urea, 0.5 M NaCI, 50 mM Tris-HCl pH 8, 10 mM e-ACA, 2 mM 
methionine, and 3 mM reduced gluthatione as buffer B. The 
reduced/oxidized gluthatione solution was freshly prepared as 
a 100 times stock solution by addition of 9.9 M H 2 0 2 to a 
stirred solution of 0.2 M reduced gluthatione before addition • 

15 to buffer A. 

After completion of the cyclic folding procedure each of the 
Kl- and K4 fusion proteins were eluted from the 
Ni 2+ NTA- agarose column with a buffer containing 0.5 M NaCI, 
50 mM Tris-HCl, 5 mM EDTA pH 8. Fusion proteins that were 
20 aggregated and precipitated on the Ni 2 + NTA- agarose column was 
eluted in buffer B. 

Virtually all of the Kl- and K4- fusion protein material were 
eluted from the Ni 2+ NTA- agarose columns by the non denaturing 
buffer. The estimated yield of Kl- fusion protein and K4-fu- 
25 sion protein were approximately 60 mg. Virtually all of the 
Kl- fusion protein as well as the K4- fusion protein appeared 
as monomeric as judged by non reducing SDS-PAGE analysis 
corresponding to an efficiency of the folding procedure above 
90%. 

30 SDS-PAGE analysis of the production of recombinant plasmin- 
ogen kringles 1 and 4 is presented in fig. 17. 

The Kl- fusion protein and the K4- fusion protein were further 
purified by affinity chromatography on lysine -Sepharose CL-6B 



WO 94/18227 



PCT/DK94/00054 



87 

(Pharmacia, Sweden) . The fusion proteins were eluted from the 
affinity columns by a buffer containing 0.5 M NaCl, 50 mM 
Tris-HCl pH 8, 10 mM €-ACA. 

Binding to lysine- Sepharose is normally accepted as indica- 
5 tion of correct folding of lysine binding kringle domains. 

The three dimensional structure of recombinant Kl and K4 
protein domains, produced by this cyclic folding procedure 
and which have been fully processed by liberation from the N- 
terminal fusion tail and subsequently purified by ion 
10 exchange chromatography, have been confirmed by X-ray dif- 
fraction (performed by Dr. Robert Huber) and two dimensional 
NMR analysis (performed by stud, scient. Peter Reinholdt and 
Dr. Flemming Poulsen) . 

The general yield of fully processed recombinant Kl and K4 
15 protein domains by this procedure is 5 mg/litre culture. 

EXAMPLE 7 

Production in E. coli and refolding of recombinant fragments 
derived from human ^-Macroglubolin and chicken Ovostatin 

This example describes the production in E. coli of the 
20 receptor-binding domain of human Qf 2 -Macroglobulin (a 2 -MRBDv) 
as a FX a cleavable fusion protein, and the purification of 
the recombinant a 2 -MRBDv after FX a cleavage. 

The 462 bp DNA fragment encoding the a 2 -Macroglobulin reading 
frame from amino acid residue Val 1299 to Ala 1451 (a 2 -MRDv) was 

25 amplified in a Polymerase Chain Reaction (PCR) , essentially 
following the protocol of Saiki et al., (1988). pA2M (gene- 
rously provided by Dr. T. Kristensen) containing the full 
length cDNA of human of 2 -Macroglobulin was used as template, 
and the oligonucleotides SEQ ID NO: 31 and SEQ ID NO: 32 as 

30 primers. The amplified coding reading frame was at the 
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5' -end, via the PCR- reaction, linked to a nucleotide 
sequence, included in SEQ ID NO: 7, encoding the amino acid 
sequence SEQ ID NO: 37 which constitute a cleavage site for 
the bovine restriction protease FX a (Nagai and Thegersen, 
5 1987) . The amplified DNA fragment was subcloned into the E. 
coli expression vector pT 7 H 6 (Christensen et al . , 1991). The 
construction of the resulting plasmid pT 7 H 6 FX-a 2 MRDv (expres- 
sing human ar 2 -MRDv) is outlined in fig. 18 and the amino acid 
sequence of the expressed protein is shown in fig. 19 (SEQ ID 
10 NO: 55) . 

Recombinant human a 2 MRDv was produced by growing and express- 
ing the plasmid pT 7 H 6 FX-a 2 MRDv in E. coli BL21 cells in a 
medium scale (2x1 litre) as described by Studier and Moffat, 
J. Mol. Biol., 189: 113-130, 1986. Exponentially growing 

15 cultures at 37°C were at OD 600 0.8 infected with 

bacteriophage XCE6 at a multiplicity of approximately 5. 
Cultures were grown at 37°C for another three hours before 
cells were harvested by centrifugation. Cells were lysed by 
osmotic shock and sonification and total cellular protein 

20 extracted into phenol (adjusted to pH 8 with Trisma base) . 

Protein was precipitated from the phenol phase by addition of 
2.5 volumes of ethanol and centrifugation. The protein pellet 
was dissolved in a buffer containing 6 M guanidinium chlo- 
ride, 50 mM Tris-HCl pH 8 and 50 mM dithioerythriol . Follow - 

25 ing gel filtration on Sephadex G-25 (Pharmacia, LKB, Sweden) 
•into 8 M Urea, 1 M NaCl, 50 mM Tris-HCl pH 8, and 10 mM 2-mer- 
captoethanol the crude protein preparation was applied to a 
Ni 2+ activated NTA-agarose column (Ni 2+ NTA- agarose) for 
purification (Hochuli et al., 1988) of the fusion protein, 

30 MGSHHHHHHGS IEGR - a 2 MRDv (wherein MGSHHHHHHGS IEGR is SEQ ID NO: 
48) and subsequently to undergo the cyclic folding procedure. 

Preparation and "charging" of the Ni 2+ NTA- agarose column is 
described under Example 1. 

All buffers prepared for liquid chromatography were degassed 
35 under vacuum prior to addition of reductant and/or use. 
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t 

Upon application of the crude protein extract on the Ni 2+ NTA- 
agarose column, the fusion protein, MGSHHHHHHGS IBGR - a 2 MRDv 
(wherein MGSHHHHHHGS IEGR is SEQ ID NO: 48) was purified from 
the majority of coli and X phage proteins by washing with one 
5 column volume of the loading buffer followed by 6 M guanidin- 
ium chloride, 50 mM Tris-HCl, and 10 mM 2-mercaptoethanol, 
until the optical density (OD) at 280 nm of the eluate was 
stable. 

The £usion protein was refolded on the Ni 2 +NTA- agarose column 
10 using a gradient manager profile as described in table 4 and 
0.5 M NaCl, 50 mM Tris-HCl pH 8, and 2.0 mM/0.2 mM 
reduced/oxidized gluthatione as buffer A and 8 M urea, 0.5 M 
NaCl, 50 mM Tris-HCl pH 8, and 5 mM reduced gluthatione as 
buffer B. The reduced/oxidized gluthatione solution was 
15 freshly prepared as a 200 times stock solution by addition of 
9.9 M H 2 0 2 to a stirred solution of 0.2 M reduced gluthatione 
before addition to buffer A. 

After completion of the cyclic folding procedure the a 2 MRDv 
fusion protein was eluted from the Ni 2+ NTA- agarose column 
20 with a buffer containing 0.5 M NaCl, 50 mM' Tris-HCl, 20 mM 
EDTA pH 8. Fusion protein that were aggregated and precipi- 
tated on the Ni 2+ NTA- agarose column was eluted in buffer B. 

Approximately 50% of the fusion protein material was eluted 
in the aqueous elution buffer. Half of this fusion protein 
25 material appeared monomeric and folded as judged by non-re- 
ducing SDS-PAGE analysis. 

Recombinant o; 2 MRDv protein was liberated from the N- terminal 
fusion tail by cleavage with the restriction protease FX a at 
room temperature in a weight to weight ratio of approximately - 
30 50 to one for four hours. After cleavage the a 2 MRDv protein 
was isolated from uncleaved fusion protein, the liberated 
fusion tail, and FX a , by gelf iltration on Sephadex G-25 into 
10 mM NaCl, 50 mM Tris-HCl pH 8, followed by ion exchange 
chromatography on Q-Sepharose: a 2 MRDv was eluted in a linear 



WO 94/18227 



PCT/DK94/00054 



90 

gradient (over 10 column volumes) from 10 mM NaCl, 10 mM 
Tris-HCl pH 8 to 500 mM NaCl, 10 mM Tris-HCl pH 8. The a 2 MRDv 
protein eluted at 150 mM NaCl. 

* 

The recombinant or 2 MRDv domain binds to the a 2 M- receptor with 
5 a similar affinity for the receptor as exhibited by the 
complete a 2 -Macroglobulin molecule (referring to the esti- 
mated K D in one ligand-one receptor binding (Moestrup and 
Gliemann 1991)). Binding analysis was performed by Dr. Seiren 
K. Moestrup and stud, scient. Kare Lehmann) . 

10 EXAMPLE 8 

Production in E. coll and refolding of recombinant fragments 
derived from the trout virus VHS envelope glycoprotein G 

Expression and In vitro refolding of recombinant fragments 
derived from the envelope glycoprotein G from the trout virus 
15 VHS in E. coli as FX a cleavable fusion proteins is performed 
using general • strategies and methods analogous to those 
outlined in the general description of the "cyclic refolding 
procedure" and given in Examples 1 through 6. 

EXAMPLE 9 

20 Production in E. coli and refolding of recombinant human 
Tetranectin and recombinant fragments derived from human 
Tetranectin 

Tetranectin is a tetrameric protein consisting of four iden- 
tical and non-covalently linked single chain subunits of 181 
25 amino acid residues (17 kDa) . Each subunit contains three 
disulphide bridges and binds Ca 2+ . Tetranectin is found in 
plasma and associated with extracellular matrix. Tetranectin 
binds specifically to plasminogen kringle 4 . This binding can 
be specifically be titrated by lysine or 6>-amino acids. 
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The cDNA encoding the reading frame corresponding to the 
mature tetranectin single chain subunit was cloned by speci- 
fic amplification in a Polymerase Chain Reaction (PGR) (Saiki 
et al., 1988) of the nucleotide sequences from amino acid 
5 residue Glu x to Val 181 using ls t strand oligo-dT primed cDNA 
synthesized from total human placental RNA as template. 
Primers used in the PCR were SEQ ID NO: 33 and SEQ ID NO: 34. 
RNA extraction and cDNA synthesis were performed using stan- 
dard procedures. 

10 The amplified reading frame encoding the monomer subunit of 
tetranectin was at the 5' -end, via the PCR-reaction, linked 
to nucleotide sequences encoding the amino acid sequence SEQ 
ID NO: 37 which constitute a cleavage site for the bovine . 
restriction protease FX a (Nagai, and Th0gersen, 1987) . A 

15 glycine residue was, due to the specific design of the 5' -PCR 
primer (SEQ. ID NO. 33), inserted between the C-terminal 
arginine residue of the FX a cleavage site (SEQ ID NO. 37) and 
the tetranectin Gl^ -residue. The amplified DNA fragment was 
subcloned into the E. coli expression vector pT 7 H 6 (Christen- 

20 sen et al., 1991). The construction of the resulting plasmid 
pT 7 H 6 FX - TETN (expressing the tetranectin monomer) is outlined 
in fig. 20 and the amino acid sequence of the expressed 
protein is shown in fig. 21 (in SEQ ID NO: 56 is shown the 
amino acid sequence encoded by the full length reading 

25 frame) . 

To prepare the tetranectin monomer, the plasmid pT 7 H 6 FX - TETN 
was grown in medium scale (4x1 litre; 2xTY medium, 5 mM 
MgS0 4 and 100 fig ampicillin) in E. coli BL21 cells, as 
described by Studier and Moffat, J. Mol. Biol., 189: 113-130, 

30 1986. Exponentially growing cultures at 37°C were at OD 600 
0.8 infected with bacteriophage XCE6 at a multiplicity of 
approximately 5. Cultures were grown at 37°C for another 
three hours and the cells harvested by centrifugation. Cells 
were resuspended in 150 ml of 0.5 M NaCl, 10 mM Tris-HCl pH 

35 8, and 1 mM EDTA pH 8. Phenol (100 ml adjusted to pH 8) was 
added and the mixture sonicated to extract the total protein. 
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Protein was precipitated from the phenol phase by 2.5 volumes 
of ethanol and centrifugation. 



The protein pellet was dissolved in a buffer containing 6 M 
guanidinium chloride, 50 mM Tris-HCl pH 8 and 0.1 M dithio- 
5 erythriol. Following gel filtration on Sephadex G-25 (Phar- 
macia, LKB, Sweden) into 8 M Urea, 1 M NaCl, 50 mM Tris-HCl 
pH 8 and 10 mM 2-mercaptoethanol, the crude protein prepara- 
tion was applied to a Ni 2+ activated NTA- agarose column 
(Ni 2+ NTA- agarose, 75 ml pre-washed with 8 M urea, 1 M NaCl, 
10 50 mM Tris-HCl pH 8, and 10 mM 2-mercaptoethanol) for purifi- 
cation (Hochuli et al., 1988) of the fusion protein, 
MGSHHHHHHGSIEGR-TETN (wherein MGSHHHHHHGSIEGR is SEQ ID NO: 
48) . 

Preparation and "charging" of the Ni 2+ NTA- agarose column is 
15 described under example 1. 

All buffers prepared for liquid chromatography were degassed 
under vacuum prior to addition of reductant and/or use. 

The column was washed with 200 ml of 8 M urea, 1 M NaCl, 50 
mM Tris-HCl pH 8, and 10 mM 2-mercaptoethanol (Buffer I) and 
20 100 ml 6 M guanidinium chloride, 50 mM Tris-HCl pH 8 and 10 
mM 2-mercaptoethanol (Buffer II). The MGSHHHHHHGSIEGR-TETN 
fusion protein was eluted with Buffer II containing 10 mM 
EDTA pH 8 and the elute was gel filtered on Sephadex G25 
using Buffer I as eluant. 

25 The protein eluted was then refolded. The fusion protein 

MGSHHHHHHGSIEGR-TETN (wherein MGSHHHHHHGSIEGR is SEQ ID NO: 
48) was mixed with 100 ml Ni 2+ NTA- agarose . The resin con- 
taining bound protein was packed into a 5 cm diameter column 
and washed with Buffer I supplemented with CaCl 2 to 2 mM. The 

30 fusion protein was refolded on the Ni 2 +NTA- agarose column at 
11-12°C using a gradient manager profile as described in 
table 4 and 0.5 M NaCl, 50 mM Tris-HCl pH 8, 2 mM CaCl 2 and 
2.0 mM/0.2 mM reduced/oxidized gluthatione as buffer A and 8 
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reduced gluthatione as buffer B. The reduced/oxidized glu- 
thatione solution was freshly prepared as a 200 times stock 
solution by addition of 9.9 M H 2 0 2 to a stirred solution of 
5 0.2 M reduced gluthatione before addition to buffer A. 

After completion of the cyclic folding procedure the tetra- 
nectin fusion protein was eluted from the Ni 2+ NTA- agarose 
column with a buffer containing 0.5 M NaCI, 50 mM Tris-HCl, 
25 mM EDTA pH 8. The tetranectin fusion protein was cleaved 

10 with FX a at 4°C overnight in a molar ratio of 1:300. After 
FX a cleavage the protein sample was concentrated 10 fold by 
ultrafiltration on a YM10 membrane (Amicon) . Recombinant 
tetranectin was, after ten times dilution of the protein 
sample with 2 mM CaCl 2# isolated by ion- exchange chromato- 

15 graphy on Q-Sepharose (Pharmacia, Sweden) in a liner gradient 
over 10 column volumes from 10 mM Tris-HCl pH 8, 2 mM CaCl 2 
to 10 mM Tris-HCl pH 8, 2 mM CaCl 2 , and 0.5 M NaCI. 

Recombinant tetranectin produced by this procedure was ana- 
lyzed by Dr. Inge Clemmensen Rigshospitalet , Copenhagen. Dr. 
20 Clemmensen found that the recombinant tetranectin with 

respect to binding to plasminogen kringle 4 and expression of 
antigenic sites behaved identically to naturally isolated 
human tetranectin. 

Preliminary experiments comparing the efficiency of refold- 
25 ing, using the "cyclic refolding procedure" , of recombinant 
Tetranectin fusion protein bound to the Ni 2+ NTA- agarose 
column versus recombinant Tetranectin contained in a dialysis 
bag indicate a significantly improved yield of soluble mono- 
mer from the solution refolding strategy. However, if either 
30 product of the cycling procedures is subjected to disulphide 
re- shuffling in solution in the presence of 5 mM CaCl 2 vir- 
tually all of the polypeptide material is converted to the 
correctly folded Tetranectin tetramer. 
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Denatured and reduced recombinant authentic Tetranectin 
contained in a dialysis bag, was refolded over 15 cyclic 
exposures to buffer B (6 M Urea, 100 mM Naci, 50 mM Tris-HCl 
pH=8 / 2 mM/0.2 mM reduced/oxidized glutathione, 2 mM CaCl 2 
5 and 0.5 mM methionine) and buffer A (100 mM NaCl, 50 mM 

Tris-HCl pH 8, 2 mM/0.2 mM reduced/oxidized glutathione, 2 mM 
CaCl 2 , and 0,5 mM methionine) . 

EXAMPLE 10 

Production and folding of a diabody expressed intracellular ly 
10 in E. coli: Mab 32 diabody directed against tumour necrosis 
factor. 

Diabodies (described in Holliger et al., 1993) are artificial 
bivalent and bispecific antibody fragments. 

This example describes the production in E. coli of a diabody 
15 directed against tumour necrosis factor alpha (TNF-or) , 

derived from the mouse monoclonal antibody Mab 32 (Rathjen et 
al., 1991, 1992; Australian Patent Appl. 7,576; 
EP-A-486,526) . 

A phagemid clone, pCANTAB5-myc-Mab32-5, containing Mab32 
20 encoded in the diabody format (PCT/GB93/02492) was generously 
provided by Dr. G. Winter, Cambridge Antibody Technology 
(CAT) Ltd., Cambridge, UK. pCANTAB5-myc-Mab32-5 DNA was used 
as template in a Polymerase Chain Reaction (PCR) (Saiki et 
al., 1988), using the primers SEQ ID NO: 35 and SEQ ID NO: 
25 36, designed to produce a cDNA fragment corresponding to the 
complete artificial diabody. The amplified coding reading 
frame was at the 5' -end, via the PCR-reaction, linked to a 
nucleotide sequence, included in SEQ ID NO: 35, encoding the 
amino acid sequence SEQ ID NO: 37 which constitute a cleavage 
30 site for the bovine restriction protease FX a (Nagai and 

Thogersen, 1987) . The amplified DNA fragment was subcloned 
into the E. coli expression vector pT 7 H 6 (Christensen et al., 
1991) . The construction of the resulting plasmid pT 7 H 6 FX-DB32 
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(expressing the Mab32 diabody) is outlined in fig. 22 and the 
amino acid sequence of the expressed protein is shown in fig. 
23 (in SEQ ID NO: 57 is shown the amino acid sequence encoded 
by the full length reading frame) . 

5 To prepare the diabody fragment, the plasmid pT 7 H 6 FX-DB32 was 
grown in medium scale (4x1 litre; 2xTY medium, 5 mM MgS0 4 
and 100 fig ampicillin) in E. coli BL21 cells, as described by 
Studier and Moffat, J. Mol. Biol., 189: 113-130, 1986. Expo- 
nentially growing cultures at 37°C were at OD 600 0.8 infected 

10 with bacteriophage XCE6 at a multiplicity of approximately 5. 
Forty minutes after infection, rifampicin was added (0.2 g in 
2 ml methanol per litre media) . Cultures were grown at 37°C 
for another three hours and the cells harvested by 
centrifugation. Cells were resuspended in 150 ml of 0.5 M 

15 NaCl, 10 mM Tris-HCl pH 8, and 1 mM EDTA pH 8. Phenol (100 ml 
adjusted to pH 8) was added and the mixture sonicated to 
extract the total protein. Protein was precipitated from the 
phenol phase by 2.5 volumes of ethanol and centrifugation. 

The protein pellet was dissolved in a buffer containing 6 M 
20 guanidinium chloride, 50 mM Tris-HCl pH 8 and 0.1 M dithio- 
erythriol. Following gel filtration on Sephadex G-25 (Phar- 
macia, LKB, Sweden) into 8 M Urea, 1 M NaCl, 50 mM Tris-HCl 
pH 8 and 10 mM 2-mercaptoethanol, the crude protein prepara- 
tion was applied to a Ni 2+ activated NTA- agarose column 
r 25 ( Ni 2 + NTA- agarose, 75 ml pre -washed with 8 M urea, 1 M NaCl, 

50 mM Tris-HCl pH 8, and 10 mM 2-mercaptoethanol) for purifi- 
cation (Hochuli et al., 1988) of the fusion protein, 
MGSHHHHHHGS IEGR - DB3 2 (wherein MGSHHHHHHGS IEGR is SEQ ID NO: 
48) . 

30 Preparation and "charging" of the Ni 2+ NTA- agarose column is 
described under example 1. 

All buffers prepared for liquid chromatography were degassed 
under vacuum prior to addition of reductant and/or use. 
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The column was washed with 200 ml of 8 M urea, 1 M NaCl, 50 
iriM Tris-HCl pH 8, and 10 mM 2-mercaptoethanol (Buffer I) and 
100 ml 6 M guanidinium chloride, 50 mM Tris-HCl pH 8 and 10 
mM 2-mercaptoethanol (Buffer II) . The MGS HHHHHHGS I EGR - DB 3 2 
5 fusion protein was eluted with Buffer II containing 10 mM 
EDTA pH 8 and the elute was gel filtered on Sephadex G25 
using Buffer I as eluant. 

The protein eluted was then refolded. The fusion protein 
MGSHHHHHHGSIEGR-DB32 (wherein MGSHHHHHHGS IEGR is SEQ ID NO: 

10 48) was mixed with 100 ml Ni 2+ NTA-agarose. The resin con- 
taining bound protein was packed into a 5 cm diameter column 
and washed with Buffer I. The fusion protein was refolded on 
the Ni 2+ NTA- agarose column at 11-12°C using a gradient mana- 
ger profile as described in table 4 and 0.5 M NaCl, 50 mM 

15 Tris-HCl pH 8, and 2.0 mM/0.2 mM reduced/oxidized gluthatione 
as buffer A and 8 M urea, 1 M NaCl, 50 mM Tris-HCl pH 8, and 
3 mM reduced gluthatione as buffer B. The reduced/oxidized 
gluthatione solution was freshly prepared as a 200 times 
stock solution by addition of 9.9 M H 2 0 2 to a stirred so- 

20 lution of 0.2 M reduced gluthatione before addition to buffer 
A. 

After completion of the cyclic folding procedure the DB32 
fusion protein was eluted from the Ni 2+ NTA- agarose column 
with a buffer containing 0.5 M NaCl, 50 mM Tris-HCl, 25 mM 
25 EDTA pH 8 and adjusted to 5 mM GSH, 0.5 mM GSSG and incubated 
for 12 to 15 hours at 20 °C. The fusion protein was then 
concentrated 50 fold by ultrafiltration using YM10 membranes 
and clarified by centrifugation. 

The DB32 fusion protein dimer was purified by gel filtration 
30 using a Superose 12 column (Pharmacia, Sweden) with PBS as 
eluant . 



The overall yield of correctly folded DB32 fusion protein 
from this procedure was 4 mg per litre. 
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An analysis by non- reducing SDS-PAGE from different stages of 
the purification is shown in fig. 26. 

The MGSHHHHHHGS IEGR (SEQ ID NO: 48) N- terminal fusion peptide 
was cleaved off the DB32 protein by cleavage with tne re- 
5 striction protease FX a (molar ratio 1:5 FX a :DB32 fusion 

protein) at 37°C for 20 hours. This is shown as the appear- 
ance of a lower molecular weight band just below the unclea- 
ved fusion protein in fig. 26. 

The refolded DB32 protein was analyzed by Cambridge Antibody 
10 Technology Ltd* (CAT) ♦ DB32 was found to bind specifically to 
TNF-a and to compete with the Mab32 whole antibody for bin- 
ding to TNF-a. Furthermore both DB32 and Mab32 were competed 
in binding to TNF-a by sheep anti-301 antiserum, which has 
been raised by immunizing sheep with a' peptide encoding the 
15 first 18 amino acids of human TNF-a? and comprise at least 
part of the epitope recognised by the murine Mab32. 

EXAMPLE 11 

Production and refolding of human psoriasin in E. coli. 

Psoriasin is a single domain Ca 2+ - binding protein of 100 
20 amino acid residues (11.5 kDa) . Psoriasin contains a single 
disulphide bridge. The protein which is believed to be a 
member of the S100 Protein family is highly up- regulated- in 
psoriatic skin and in primary human keratinocytes undergoing 
abnormal differentiation. 

25 The plasmid pT 7 H 6 FX-PS.4 (kindly provided by Dr. P. Madsen, 
Insitute of Medical Biochemistry, University of Aarhus, 
Denmark) has previously been described by Hoffmann et al., 
(1994) . The nucleotide sequence encoding the psoriasin pro- 
tein from Ser 2 to Gln 101 is in the 5 '-end linked to the 

30 nucleotide sequence encoding the amino acid sequence 

MGSHHHHHHGS IEGR (SEQ ID NO: 48). A map of pT 7 H 6 FX-PS.4 is 
given in fig. 24 and the amino acid sequence of human psoria- 
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sin is listed in fig. 25 (in SEQ ID NO: 58 is shown the amino 
acid sequence encoded by the full length reading frame) . 

Recombinant human psoriasin was grown and expressed from the 
plasmid pT 7 H 6 FX-PS.4 in E. coli BL21 cells and total cellular 
5 protein extracted as described (Hoffmann et al . , 1994). 

Ethanol precipitated total protein was dissolved in a buffer 
containing 6 M guanidinium chloride, 50 mM Tris-HCl pH 8 and 
50 mM dithioerythriol . Following gel filtration on Sephadex 
G-25 (Pharmacia, LKB, Sweden) into 8 M Urea, 0.5 M NaCI, 50 

10 mM Tris-HCl pH 8 and 5 mM 2-mercaptoethanol the crude protein 
preparation was applied to a Ni 2+ activated NTA- agarose 
column (Ni 2+ NTA- agarose) for purification (Hochuli et al., 
1988) of the fusion protein, MGSHHHHHHGSIEGR- psoriasin 
(wherein MGSHHHHHHGSIEGR is SEQ ID NO: 48) and subsequently 

15 to undergo the cyclic folding procedure. 

Preparation and "charging" of the Ni 2+ NTA- agarose column is 
described under Example 1. 

All buffers prepared for liquid chromatography were degassed 
under vacuum prior to addition of reductant and/or use. 

20 Upon application of the crude protein extract on the Ni 2+ NTA- 
agarose column, the fusion protein, MGSHHHHHHGSIEGR -psoriasin 
(wherein MGSHHHHHHGSIEGR is SEQ ID NO: 48) was purified from 
the majority of coli and.X phage proteins by washing with one 
column volume of the loading buffer followed by 6 M guanidin- 

25 ium chloride, 50 mM Tris-HCl, and 5 mM 2-mercaptoethanol 
until the optical density (OD) at 280 nm of the eluate was 
stable. 

The fusion protein was refolded on the Ni 2+ NTA- agarose column 
using a gradient manager profile as described in table 4 and 
30 0.5 M NaCI, 50 mM Tris-HCl pH 8, 2 mM CaCl 2 and 1.0 mM/0.1 mM 
reduced/oxidized gluthatione as buffer A and 8 M urea, 0.5 M 
NaCI, 50 mM Tris-HCl pH 8, 2 mM CaCl 2 and 5 mM reduced glu- 
thatione as buffer B. The reduced/oxidized gluthatione so- 
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lution was freshly prepared as a 200 times stock solution by 
addition of 9.9 M H 2 0 2 to a stirred solution of 0.2 M reduced 
gluthatione before addition to buffer A. 

After completion of the cyclic folding procedure the psori- 
5 asin fusion protein was eluted from the Ni 2+ NTA- agarose 

column with a buffer containing 0-5 M NaCl, 50 mM Tris-HCl, 
10 mM EDTA pH 8. Fusion protein that were aggregated and 
precipitated on the Ni 2+ NTA- agarose column was eluted in 
buffer B. 

10 Approximately 95% of the fusion protein material was eluted 
by the non denaturing elution buffer. As judged by non-re- 
ducing SDS-PAGE analysis 75% of the soluble fusion protein 
material appeared to be monomeric yielding an overall effi- 
ciency of the folding procedure of approximately 70%. The 

15 efficiency of the previously described refolding procedure 
for production of recombinant human psoriasin (Hoffman et 
al. f 1994) was estimated to be less than 25%. 

The psoriasin fusion protein was cleaved with FX a in a molar 
ratio of 100:1 for 48 hrs at room temperature. After gelfil- 

20 tration into a buffer containing 20 mM Na-acetate pH 5 and 20 
mM NaCl on Sephadex G-25 the protein sample was applied onto 
a S-Sepharose ion exchange column (Pharmacia) . Monomeric 
recombinant psoriasin was eluted over 5 column volumes with a 
linear gradient from 20 mM Na-acetate pH 5, 20 mM NaCl to 0.5 

25 M NaCl. Monomeric psoriasin eluted at 150 mM NaCl. Dimeric 

and higher order multimers of psoriasin together with unclea- 
ved fusion protein eluted lated in the gradient. Fractions 
containing the cleaved purified recombinant protein was 
gelfiltrated on Sephadex G25 into a buffer containing 150 mM 

30 NaCl, 10 mM Tris-HCl pH 7.4 and stored at 4°C. 

EXAMPLE 12 



Evaluation procedure for suitability testing of thiol com- 
pounds for use as reducing agents in cyclic refolding and 
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determination of optimal levels of denaturants and disulphide 
reshuffling agents for optimization of cyclic refolding 
procedures. 

In order to improve the yield of correctly folded protein 
5 obtainable from cyclic refolding the number of productive 
cycles should be maximized (see SUMMARY OF THE INVENTION) . 
Productive cycles are characterized by steps of denaturation 
where misfolded protein, en route to dead-end aggregate 
conformational states, is salvaged into unfolded conformatio- 
10 nal states while most of the already correctly folded protein 
remains in conformational states able to snap back into the 
refolded state during the refolding step of the cycle. 

A number of disulphide bridge containing proteins, like j3 2 - 
microglobulin, are known to refold with high efficiency 
15 (>95%) when subjected to high levels of denaturing agents as 
long as their disulphide bridges remain intact. 



This example describes how to evaluate suitability of a thiol 
compound for use in cyclic refolding on the basis of its 
ability to discriminate correct from incorrect disulphide 

20 bridges and how to optimize levels of denaturing agent and/or 
reducing agent to be used in the denaturation steps in order 
to maximize the number of productive cycles. As model system 
we chose a mixture of mono-, di- and multimeric forms of 
purified recombinant human ^-microglobulin. ° ur specific aim 

25 was to analyze the stability of different topological forms 
of human 0 2 -microglobulin against reduction by five different 
reducing agents at various concentrations of denaturing 
agent . 

Human /^-microglobulin (produced as described in Example 13) 
30 in 6 M guanidinium chloride, 50 mM Tris-HCl and 10 mM 2- 
mercaptoethanol pH 8 was gelfiltrated into non- denaturing 
buffer (50 mM Tris-HCl, 0.5 M NaCl pH 8) . Only a fraction of 
the protein in the sample was soluble in the non -denaturing 
buffer. After 48 hours exposure to air, the protein solution 
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appeared unclear. Non- reducing SDS-PAGE analysis showed that 
most of the protein had been oxidized into multimeric forms 
and only a small fraction was oxidized and monomeric (Pig. 
27, lane 1) . 

5 The protein solution was aliquoted into a number of tubes and 
varying amounts of urea added while keeping the concentration 
of protein and salt at a constant level - 

Reducing agent, either gluthatione, cysteine ethyl ester, N- 
acetyl-L- cysteine, mercaptosuccinic acid or 2-mercaptoethanol 

10 was added to the ensemble of protein samples with varying 
urea concentrations. Each reducing agents was added to a 
final concentration of 4 mM. The protein samples were incu- 
bated at room temperature for 10 min and then free thiol 
groups were blocked by addition of iodoacetic acid to a final 

15 concentration of 12 mM. Finally, the protein samples were 

analyzed by non- reducing SDS-PAGE (fig. 27 - 32) . The compo- 
sitions of the test -samples used in the non- reducing SDS-PAGE 
as well as the results are given below in the following 
tables; in the rows indicating the ability of the chosen 

20 reducing agent to reduce disulphide bridges the marking n +++" 
indicates good ability, "++ n indicates intermediate ability, 
"+ n indicates weak ability, whereas no marking indicates that 
no measurable effect could be observed. 
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Composition of samples used in SDS-PAGE of fig. 27 



Test no. 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


fil protein solution 


36 


36 


36 


36 


36 


36 


36 


36 


36 


36 


36 


/d Buffer A 


160 


160 


140 


120 


100 


80 


70 


60 


50 


40 


20 


til Buffer B 


0 


0 


20 


40 


60 


80 


90 


100 


110 


120 


140 


fil GSH 


0 


4 


4 


4 


4 


4 


4 


4 


4 


4 


4 


M urea 


0 


0 


1 


2 


3 


4 


4.5 


5 


5.5 


6 


7 


Ability to reduce 






+ 


+ 


++ 


++ 


+++ 


+++ 


+++ 


+++ 


++ + 



wrong disulphide 
10 bridges 

Ability to reduce + + + + 

correct disulphide 

bridges 



Buffer A : 50 mM Tris.HCl pH 8, 0.5 M NaCl 
15 Buffer B: 10 M urea, 50 mM Tris.HCl pH 8, 0.5 M NaCl 

GSH: 0.2 M Gluthatione 

Protein solution: 2 mg/ml h0 2 m, 50 mM Tris.HCl pH 8, 0.5 M NaCl 



Composition of samples used in SDS-PAGE of fig. 28 


Test no. 


1 


2 


3 


4 


5 


6 


7 


8 


9 


20 fd protein solution 


36 


36 


36 


36 


36 


36 


36 


36 


36 


id Buffer A 


160 


160 


140 


120 


100 


80 


60 


40 


20 


id Buffer B 


0 


0 


20 


40 


60 


80 


100 


120 


140 


jdCE 


0 


4 


4 


4 


4 


4 


4 


4 


4 


M urea 


0 


0 


1 


2 


3 


4 


5 


6 


7 


25 Ability to reduce wrong 
disulphide bridges 




++ 


++ 


++ 


+++ 


+++ 


++ + 


+++ 


+++ 


Ability to reduce correct 
disulphide bridges 














+ + 


+++ 


++ + 



Buffer A : 50 mM Tris.HCl pH 8, 0.5 M NaCl 
30 Buffer B: 10 M urea, 50 mM Tris.HCl pH 8, 0.5 M NaCl 

CE: 0.2 M L-cysteine ethyl ester 

Protein solution: 2 mg/ml h/3 2 m, 50 mM Tris.HCl pH 8, 0.5 M NaCl 
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Composition of samples used 


in SDS-PAGE of fig. 29 






Test no. 


1 


2 


3 


4 


5 


6 


7 


o 
o 


Q 


fd protein solution 


36 


36 


36 


36 


36 


36 


36 


Jo 


36 


fii butter A 


160 


160 


140 


120 


100 


on 

80 


60 


40 


20 


pd Buffer B 


0 


0 


20 


40 


60 


80 


100 


120 


140 


fd ME 


0 


4 


4 


4 


4 


4 


4 


4 


4 


M urea 


0 


0 


l 


2 


3 


4 


5 


6 


7 


Ability to reduce wrong 
disulphide bridges 




++ 


++ 


++ 


++ + 


+++ 


+++ 


+++ 


+++ 


Ability to reduce correct 
disulphide bridges 












+ 


++ 


++ + 


+++ 



Buffer A : 50 mM Tris.HCl pH 8, 0.5 M Nad 
Buffer B: 10 M urea, 50 mM Tris.HCl pH 8, 0.5 M NaCl 

ME: 0.2 M 2-mercaptoethanol 

15 Protein solution: 2 mg/ml h/S 2 m, 50 mM TrkHCl pH 8, 0.5 M NaCl 



Composition of samples used in SDS-PAGE of fig. 30 


Test no. 


1 


2 


3 


4 


5 


6 


7 


8 


9 


pi protein solution 


36 


36 


36 


36 


36 


36 


36 


36 


36 


ftl Buffer A 


160 


160 


140 


120 


100 


80 


60 


40 


20 


20 id Buffer B 


0 


0 


20 


40 


60 


80 


100 


120 


140 


id MSA 


0 


4 


4 


4 


4 


4 


4 


4 


4 


M urea 


0 


0 


1 


2 


3 


4 


5 


6 


7 


Ability to reduce wrong 
disulphide bridges 




++ 


++ 


++ 


++ 


++ 


+++ 


+++ 


+++ 


25 Ability to reduce correct 
disulphide bridges 














++ 


+++ 


+++ 



Buffer A : 50 mM Tris.HCl pH 8, 05 M NaCl 
Buffer B: 10 M urea, 50 mM Tris.HCl pH 8, 0.5 M NaCl 

MSA: 0.2 M Mercaptosuccinic acid 

30 Protein solution: 2 mg/ml h/3 2 m, 50 mM Tris.HCl pH 8, 0.5 M NaCl 
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Composition of 


samples used in SDS-PAGE of fig. 31 






Test no. 


1 


2 


3 


4 


5 


6 


7 


R 
O 


o 


fi\ protein solution 


36 


36 


36 


36 


36 


36 


36 


36 


36 


111 Buffer A 


160 


160 


140 


120 
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80 


60 


40 


20 


id Buffer B 
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0 
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80 
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120 


140 


plAC 


0 
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4 


4 


4 


4 


4 


4 


M urea 


0 


0 


1 


2 


3 


4 


5 


6 


7 


Ability to reduce wrong 
disulphide bridges 




+ 


++ 


++ 


+++ 


+++ 


+++ 


+++ 


+ + + 


Ability to reduce correct 
disulphide bridges 










+ 


+ + 


+++ 


+++ 


+++ 



Buffer A : 50 mM Tris.HCl pH 8, 0.5 M NaCl 
Buffer B: 10 M urea, 50 mM Tris.HCl pH 8, 0.5 M NaCl 

AC: 0.2 M N-acetyl-L-cysteine 

15 Protein solution: 2 mg/ml hjS 2 m, 50 mM Tris.HCl pH 8, 0.5 M NaCl 

The different topological forms of 0 2 -m may be separated by 
non-reducing SDS-PAGE gel electrophoresis. The fastest migra- 
ting band represents the oxidized monomeric form. This band 
is immediately followed by the reduced j3 2 -ni with a slightly 
20 slower migration rate, whereas the multimeric forms of the 
protein are migrating much slower in the gel. 

In this analysis we are probing for the ability of each of 
the five reducing agents tested, to reduce the disulphide 
bridges of multimeric forms of ^-microglobulin without 
25 significantly reducing the correctly formed disulphide bridge 
of the monomeric oxidized form. 

The results from the analyses (fig. 27 - 32) are, in summary, 
as follows: N-acetyl-L-cysteine and mercaptosuccinic acid 
are, under the conditions used, essentially unable to dis- 
30 criminate correct and incorrect disulphide bridges. 

Glutathione, cysteine ethyl ester and 2-mercaptoethanol are 
all capable of - within 10 min and within individual cha- 
racteristic ranges of urea concentrations - significantly 
reducing disulphide bridges of multimeric forms while most of 
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the oxidised monomeric /3 2 ~ m remains in the oxidised form. 
Gluthatione has clearly the capacity of selectively reducing 
incorrect disulphide bridges at higher concentrations of urea 
compared to cysteine ethyl ester and 2-mercaptoethanol and 
5 therefore gluthatione among the selection of thiols tested 
would be the reducing agent of choice for cyclic refolding of 
human ^-microglobulin. As a consequence of these experiments 
the concentration of urea in the reducing buffer B for the 
refolding procedure used in Example 13 was lowered from 8 M 
10 (Example 1) to 6 M, which led to an improvement of overall 
refolding yield of human /^-microglobulin from 53% to 87%.. 

EXAMPLE 13 

Refolding of purified human (3 2 .microglobulin: Comparative 
analysis of three refolding procedures 

15 The following set of experiments were undertaken to obtain 
comparable quantitative data to evaluate the importance of 
cycling for refolding yield versus simple refolding proce- 
dures involving a stepwise or a gradual one-pass transition 
from strongly denaturing and reducing conditions to non- 
20 denaturing and non- reducing conditions. 

Purified refolded recombinant human /^-microglobulin fusion 
protein, obtained as described in EXAMPLE 1, was reduced and 
denatured to obtain starting materials devoid of impurities, 
such as proteolytic breakdown products or minor fractions of 
25 fusion protein damaged by irreversible oxidation or other 
chemical derivatization. 

In a first step the optimization procedure described in 
EXAMPLE 12 was used to modify the conditions for cyclic 
refolding described in EXAMPLE 1 to increase the number of 
30 productive cycles. The optimized refolding protocol was 

identical to that described in EXAMPLE 1, as were buffers and 
other experimental parameters, except that the Buffer B in 
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the present experiments was 6 M urea, 50 mM Tris-HCl pH 8, 
0.5 M NaCl, 4 mM glutathione. 

Three batches of pure fusion protein were refolded while 
attached to Ni ++ -loaded NTA- agarose as described in EXAMPLE 
5 1, using the present Buffer B composition. One batch was 
submitted to buffer cycling as described in EXAMPLE 1, for 
batch two and three cycling was replaced by a monotonous 
linear buffer gradient (100% B to 0% B over 24 hours) and a 
step gradient (100% B to 0% B in one step, followed by 0% B 

10 buffer for 24 hours) , respectively. In each refolding experi- 
ment all of the polypeptide material was recovered as 
described in EXAMPLE 1 as a soluble fraction elutable under 
non- denaturing conditions and a remaining insoluble fraction 
elutable only under denaturing and reducing conditions. The 

15 yields of correctly folded fusion protein were the measured 
by quantitative densitometric analysis (Optical scanner HW 
and GS-370 Densitometric Analysis SW package from Hoeffer 
Scientific, CA, USA) of Coomassie stained SDS-PAGE gels on 
which suitably diluted measured aliquots of soluble and 

20 insoluble fractions had been separated under reducing or non- 
reducing condition, as required to allow separation of cor- 
rectly disulphide- bridged monomer from soluble polymers in 
soluble fractions. Where required to obtain reliable 
densitometric data both for intense and faint bands in a gel 

25 lane several sample dilutions were scanned and analyzed to 
obtain rescaled data sets. 

Experimental details and results 

Purified denatured and reduced fusion protein: 

A batch of human ^-microglobulin fusion protein was refolded 
30 as described in EXAMPLE 1. 96% of the fusion protein was 

recovered in the soluble fraction (Fig 32, lanes 2-5) . 56% of 
this soluble fraction was in the monomeric and disulphide - 
bridged form. Hence, the overall refolding efficiency 
obtained was 53%. Monomeric fusion protein was purified from 
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mul timers by ion exchange chromatography on S-Sepharose 
(Pharmacia, Sweden) : The soluble fraction obtained after 
refolding was gel filtered on Sephadex G-25 (Pharmacia, 
Sweden) into a buffer containing 5 mM NaCl and 5 mM Tris-HCl 
5 pH 8, diluted to double volume with water and then applied to 
the S-Sepharose column, which was then eluted using a gradi- 
ent (5 column volumes from 2.5 mM Tris-HCl pH 8, 2.5 mM NaCl 
to 25 mM Tris-HCl pH 8, 100 mM NaCl) ; The monomeric correctly 
folded fusion protein purified to >95% purity (Fig. 32, lanes 
10 6 and 7) was then made 6 M in guanidinium hydrochloride and 
0.1 M in DTE, gel filtrated into a buffer containing 8 M 
urea, 50 mM Tris-HCl pH 8, 1 M NaCl and 10 mM 2-mercaptoetha- 
nol and then divided into aliquots to be used as starting 
material for the refolding experiments described below. 

15 Cyclic refolding of purified fusion protein: 

An aliquot of denatured reduced fusion protein was applied to 
a Ni ++ -loaded NTA column which was then washed with one 
column volume of a buffer containing 6 M guanidinium 
hydrochloride, 50 mM Tris-HCl pH 8 and 10 mM 2-mercaptoetha- 
20 nol. 

The fusion protein was then subjected to buffer cycling 
according to the scheme shown in Table 1 using Buffer A: 50 
mM Tris-HCl pH 8, 0.5 M NaCl and 3.2 mM/0.4 mM 
reduced/oxidized glutathione and Buffer B: 50 mM Tris-HCl pH 

25 8, 0.5 M NaCl, 6 M urea and 4 mM reduced glutathione. After 
completion of buffer cycling the fusion protein was recovered 
quantitatively in a soluble form by elution of the column 
with a buffer containing 50 mM Tris-HCl pH 8, 0.5 M NaCl and 
20 mM EDTA. 87% was obtained in the correct monomeric 

30 disulphide- bridged form (Fig. 32 lanes 8 and 9) . 

Refolding of purified fusion protein by linear gradient: 

An aliquot of denatured reduced fusion protein was applied to 
a Ni ++ -loaded NTA column which was then washed with one 
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column volume of a buffer containing 6 M guanidinium 
hydrochloride, 50 mM Tris-HCl pH 8 and 10 mM 2-mercaptoetha- 
nol followed by 1 column volume of a buffer containing 50 mM 
Tris-HCl pH 8, 0.5 M NaCl, 6 M urea and 4 mM reduced 
5 glutathione . 

A 24 hour linear gradient from 100% B to 100% A was then 
applied at 2 ml/min, using Buffer A: 50 mM Tris-HCl pH 8, 0.5 
M NaCl and 3.2 mM/0.4 mM reduced/oxidized glutathione and 
Buffer B: 50 mM Tris-HCl pH 8, 0.5 M NaCl, 6 M urea and 4 mM 

10 reduced glutathione. After completion of the gradient the 
soluble fraction of fusion protein was eluted in a buffer 
containing 50 mM Tris-HCl pH 8, 0.5 M NaCl and 20 mM EDTA. 
The remaining insoluble fraction was extracted from column in 
a buffer containing 50 mM Tris-HCl pH 8, 1 M NaCl, 8 M urea, 

15 10 mM 2-mercaptoethanol and 20 mM EDTA. 

48% of the fusion protein was recovered in the soluble frac- 
tion and 60% of the soluble fraction was recovered in the 
correct monomeric disulphide- bridged form. The overall effi- 
ciency of folding obtained was therefore 29% (Fig 33, lanes 
20 5-7). 

Refolding of purified fusion protein by buffer step: 

An aliquot of denatured reduced fusion protein was applied to 
a Ni ++ - loaded -NTA column which was then washed with one 
column volume of a buffer containing 6 M guanidinium 
25 hydrochloride, 50 mM Tris-HCl pH 8 and 10 mM 2-mercaptoetha- 
nol. 

Buffer containing 50 mM Tris-HCl pH 8, 0.5 M NaCl and 
3.2 mM/0.4 mM reduced/ oxidized glutathione was then applied 
to the column at 2 ml/min for 24 hours before recovering the 
30 soluble fraction of fusion protein in a buffer containing 50 
mM Tris-HCl pH 8 , 0.5 M NaCl and 20 mM EDTA. The remaining 
insoluble fraction was extracted from column in a buffer 
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containing 50 mM Tris-HCl pH 8, 1 M NaCI, 8 M urea, 10 mM 2- 
mercaptoethanol and 20 mM EDTA. 

34% of the fusion protein was recovered in the soluble frac- 
tion and 28% of the soluble fraction was recovered in the 
5 correct monomeric disulphide- bridged form. The overall effi- 
ciency of folding obtained was therefore 9.5% (Fig 33, lanes 
1-3) . 

Conclusions 

In summary, using human ^-microglobulin as a model protein, 
10 it may be concluded that (a) straightforward buffer 

optimization and improved purification of fusion protein 
prior to cyclic refolding increased refolding yield signifi- 
cantly (from 53% to 87%) and (b) progressive denaturation - 
renaturation cycling is superior to single-pass refolding 
15 under otherwise comparable experimental conditions by a very 
large factor (87% versus 29% or 9.5% yields). 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: Denzyme ApS 

(B) STREET: Gustav Wieds Vej 10 

(C) CITY: Aarhus C 

(E) COUNTRY: Denmark 

(F) POSTAL CODE (ZIP) : 8000 

(ii) TITLE OF INVENTION: Improved method for the refolding of proteins 
(iii) NUMBER OF SEQUENCES: 47 

(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER : IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS -DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.25 (EPO) 



(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1554 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
(iii) HYPOTHETICAL: YES 

(iii) ANTI- SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Bos taurus 

(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 76.. 1551 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 

AGCCTGGGCG AGCGGACCTT GCCCTGGAGG CCTGTTGCGG CAGGGACTCA CGGCTGTCCT 60 

CGGAAGGGCC CCACC ATG GCG GGC CTG CTG CAT CTC GTT CTG CTC AGC ACC 111 
Met Ala Gly Leu Leu His Leu Val Leu Leu Ser Thr 
15 10 

GCC CTG GGC GGC CTC CTG CGG CCG GCG GGG AGC GTG TTC CTG CCC CGG 159 
Ala Leu Gly Gly Leu Leu Arg Pro Ala Gly Ser Val Phe Leu Pro Arg 
15 20 25 

GAC CAG GCC CAC CGT GTC CTG CAG AGA GCC CGC AGG GCC AAC TCA TTC 207 
Asp Gin Ala His Arg Val Leu Gin Arg Ala Arg Arg Ala Asn Ser Phe 
30 35 40 
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TTG GAG GAG GTG AAG CAG GGA AAC CTG GAG CGA GAG TGC CTG GAG GAG 
Leu Glu Glu Val Lys Gin Gly Asn Leu Glu Arg Glu Cys Leu Glu Glu 
45 50 55 60 



255 



GCC TGC TCA CTA GAG GAG GCC CGC GAG GTC TTC GAG GAC GCA GAG CAG 
Ala Cys Ser Leu Glu Glu Ala Arg Glv. Val Phe Glu Asp Ala Glu Gin 
65 70 75 



303 



ACG GAT GAA TTC TGG AGT AAA TAC AAA GAT GGA GAC CAG TGT GAA GGC 
Thr Asp Glu Phe Trp Ser Lys Tyr Lys Asp Gly Asp Gin Cys Glu Gly 
80 85 90 



351 



CAC CCG TGC CTG AAT CAG GGC CAC TGT AAA GAC GGC ATC GGA GAC TAC 
His Pro Cys Leu Asn Gin Gly His Cys Lys Asp Gly lie Gly Asp Tyr 
95 100 105 



399 



ACC TGC ACC TGT GCG GAA GGG TTT GAA GGC AAA AAC TGC GAG TTC TCC 
Thr Cys Thr Cys Ala Glu Gly Phe Glu Gly Lys Asn Cys Glu Phe Ser 
110 115 120 



447 



ACG CGT GAG ATC TGC AGC CTG GAC AAT GGA GGC TGC GAC CAG TTC TGC 
Thr Arg Glu lie Cys Ser Leu Asp Asn Gly Gly Cys Asp Gin Phe Cys 
125 130 135 140 



495 



AGG GAG GAG CGC AGC GAG GTG CGG TGC TCC TGC GCG CAC GGC TAC GTG 
Arg Glu Glu Arg Ser Glu Val Arg Cys Ser Cys Ala His Gly Tyr Val 
145 150 155 



543 



CTG GGC GAC GAC AGC AAG TCC TGC GTG TCC ACA GAG CGC TTC CCC TGT 
Leu Gly Asp Asp Ser Lys Ser Cys Val Ser Thr Glu Arg Phe Pro Cys 
160 165 170 



591 



GGG AAG TTC ACG CAG GGA CGC AGC CGG CGG TGG GCC ATC CAC ACC AGC 
Gly Lys Phe Thr Gin Gly Arg Ser Arg Arg Trp Ala lie His Thr Ser 
175 180 185 



639 



GAG GAC GCG CTT GAC GCC AGC GAG CTG GAG CAC TAC GAC CCT GCA GAC 
Glu Asp Ala Leu Asp Ala Ser Glu Leu Glu His Tyr Asp Pro Ala Asp 
190 195 200 



687 



CTG AGC CCC ACA GAG AGC TCC TTG GAC CTG CTG GGC CTC AAC AGG ACC 
Leu Ser Pro Thr Glu Ser Ser Leu Asp Leu Leu Gly Leu Asn Arg Thr 
205 210 215 220 



735 



GAG CCC AGC GCC GGG GAG GAC GGC AGC CAG GTG GTC CGG ATA GTG GGC 
Glu Pro Ser Ala Gly Glu Asp Gly Ser Gin Val Val Arg lie Val Gly 
225 230 235 



783 



GGC AGG GAC TGC GCG GAG GGC GAG TGC CCA TGG CAG GCT CTG CTG GTC 
Gly Arg Asp Cys Ala Glu Gly Glu Cys Pro Trp Gin Ala Leu Leu Val 
240 245 250 



831 



AAC GAA GAG AAC GAG GGA TTC TGC GGG GGC ACC ATC CTG AAC GAG TTC 
Asn Glu Glu Asn Glu Gly Phe Cys Gly Gly Thr lie Leu Asn Glu Phe 
255 260 265 



879 
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TAC GTC CTC ACG GCT GCC CAC TGC CTG CAC CAG GCC AAG AGG TTC ACG 927 
Tyr Val Leu Thr Ala Ala His Cys Leu His Gin Ala Lys Arg Phe Thr 
270 275 280 

GTG AGG GTC GGC GAC CGG AAC ACA GAG CAG GAG GAG GGC AAC GAG ATG 975 
Val Arg Val Gly Asp Arg Asn Thr Glu Gin Glu Glu Gly Asn Glu Met 
285 290 295 300 

GCA CAC GAG GTG GAG ATG ACT GTG AAG CAC AGC CGC TTT GTC AAG GAG 1023 
Ala His Glu Val Glu Met Thr Val Lys His Ser Arg Phe Val Lys Glu 
305 310 315 

ACC TAC GAC TTC GAC ATC GCG GTG CTG AGG CTC AAG ACG CCC ATC CGG 1071 
Thr Tyr Asp Phe Asp lie Ala Val Leu Arg Leu Lys Thr Pro lie Arg 
320 325 330 

TTC CGC CGG AAC GTG GCG CCC GCC TGC CTG CCC GAG AAG GAC TGG GCG 1119 
Phe Arg Arg Asn Val Ala Pro Ala Cys Leu Pro Glu Lys Asp Trp Ala 
335 340 345 

GAG GCC ACG CTG ATG ACC CAG AAG ACG GGC ATC GTC AGC GGC TTC GGG 1167 
Glu Ala Thr Leu Met Thr Gin Lys Thr Gly lie Val Ser Gly Phe Gly 
350 355 360 

CGC ACG CAC GAG AAG GGC CGC CTG TCG TCC ACG CTC AAG ATG CTG GAG 1215 
Arg Thr His Glu Lys Gly Arg Leu Ser Ser Thr Leu Lys Met Leu Glu 
365 370 375 380 

GTG CCC TAC GTG GAC CGC AGC ACC TGT AAG CTG TCC AGC AGC TTC ACC 1263 
Val Pro Tyr Val Asp Arg Ser Thr Cys Lys Leu Ser Ser Ser Phe Thr 
385 390 395 

ATT ACG CCC AAC ATG TTC TGC GCC GGC TAC GAC ACC CAG CCC GAG GAC 1311 
lie Thr Pro Asn Met Phe Cys Ala Gly Tyr Asp Thr Gin Pro Glu Asp 
400 405 410 

GCC TGC CAG GGC GAC AGT GGC GGC CCC CAC GTC ACC CGC TTC AAG GAC 1359 
Ala Cys Gin Gly Asp Ser Gly Gly Pro His Val Thr Arg Phe Lys Asp 
415 420 425 ; 

ACC TAC TTC GTC ACA GGC ATC GTC AGC TGG GGA GAA GGG TGC GCG CGC 1407 
Thr Tyr Phe Val Thr Gly lie Val Ser Trp Gly Glu Gly Cys Ala Arg 
430 435 440 

AAG GGC AAG TTC GGC GTC TAC ACC AAG GTC TCC AAC TTC CTC AAG TGG 1455 
Lys Gly Lys Phe Gly Val Tyr Thr Lys Val Ser Asn Phe Leu Lys Trp 
445 450 455 460 

ATC GAC AAG ATC ATG AAG GCC AGG GCA GGG GCC GCG GGC AGC CGC GGC 1503 
lie Asp Lys lie Met Lys Ala Arg Ala Gly Ala Ala Gly Ser Arg Gly 
465 470 475 

CAC AGT GAA GCC CCT GCC ACC TGG ACG GTC CCG CCG CCC CTC CCC CTC 1551 
His Ser Glu Ala Pro Ala Thr Trp Thr Val Pro Pro Pro Leu Pro Leu 
480 485 490 



TAA 



1554 
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(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 492 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Ala Gly Leu Leu His Leu Val Leu Leu Ser Thr Ala Leu Gly Gly 
1 5 10 15 

Leu Leu Arg Pro Ala Gly Ser Val Phe Leu Pro Arg Asp Gin Ala His 
20 25 30 

Arg Val Leu Gin Arg Ala Arg Arg Ala Asn Ser Phe Leu Glu Glu Val 
35 40 45 

Lys Gin Gly Asn Leu Glu Arg Glu Cys Leu Glu Glu Ala Cys Ser Leu 
50 55 60 

Glu Glu Ala Arg Glu Val Phe Glu Asp Ala Glu Gin Thr Asp Glu Phe 
65 70 75 80 

Trp Ser Lys Tyr Lys Asp Gly Asp Gin Cys Glu Gly His Pro Cys Leu 
85 90 95 

Asn Gin Gly His Cys Lys Asp Gly lie Gly Asp Tyr Thr Cys Thr Cys 
100 105 110 

Ala Glu Gly Phe Glu Gly Lys Asn Cys Glu Phe Ser Thr Arg Glu lie 
115 120 125 

Cys Ser Leu Asp Asn Gly Gly Cys Asp Gin Phe Cys Arg Glu Glu Arg 
130 135 140 

Ser Glu Val Arg Cys Ser Cys Ala His Gly Tyr Val Leu Gly Asp Asp 



Ser Lys Ser Cys Val Ser Thr Glu Arg Phe Pro Cys Gly Lys Phe Thr 
165 170 * 175 

Gin Gly Arg Ser Arg Arg Trp Ala He His Thr Ser Glu Asp Ala Leu 
180 185 190 

Asp Ala Ser Glu Leu Glu His Tyr Asp Pro Ala Asp Leu Ser Pro Thr 
195 200 205 

Glu Ser Ser Leu Asp Leu Leu Gly Leu Asn Arg Thr Glu Pro Ser Ala 
210 215 220 

Gly Glu Asp Gly Ser Gin Val Val Arg He Val Gly Gly Arg Asp Cys 
225 230 235 240 

Ala Glu Gly Glu Cys Pro Trp Gin Ala Leu Leu Val Asn Glu Glu Asn 



145 



150 



155 



160 



245 



250 



255 
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Glu Gly Phe Cys Gly Gly Thr lie Leu Asn Glu Phe Tyr Val Leu Thr 
260 265 270 

Ala Ala His Cys Leu His Gin Ala Lys Arg Phe Thr Val Arg Val Gly 
275 280 285 

Asp Arg Asn Thr Glu Gin Glu Glu Gly A«n Glu Met Ala His Glu Val 
290 295 300 

Glu Met Thr Val Lys His Ser Arg Phe Val Lys Glu Thr Tyr Asp Phe 
305 310 315 320 

Asp lie Ala Val Leu Arg Leu Lys Thr Pro lie Arg Phe Arg Arg Asn 
325 330 335 

Val Ala Pro Ala Cys Leu Pro Glu Lys Asp Trp Ala Glu Ala Thr Leu 
340 345 350 

Met Thr Gin Lys Thr Gly lie Val Ser Gly Phe Gly Arg Thr His Glu 
355 360 365 

Lys Gly Arg Leu Ser Ser Thr Leu Lys Met Leu Glu Val Pro Tyr Val 
370 375 380 

Asp Arg Ser Thr Cys Lys Leu Ser Ser Ser Phe Thr lie Thr Pro Asn 
385 390 395 400 

Met Phe Cys Ala Gly Tyr Asp Thr Gin Pro Glu Asp Ala Cys Gin Gly 
405 410 415 

Asp Ser Gly Gly Pro His Val Thr Arg Phe Lys Asp Thr Tyr Phe Val 
420 425 . 430 

Thr Gly lie Val Ser Trp Gly Glu Gly Cys Ala Arg Lys Gly Lys Phe 
435 440 445 

Gly Val Tyr Thr Lys Val Ser Asn Phe Leu Lys Trp lie Asp Lys lie 
450 455 460 

Met Lys Ala Arg Ala Gly Ala: Ala Gly Ser Arg Gly His Ser Glu Ala 
465 470 475 480 



Pro Ala Thr Trp Thr Val Pro Pro Pro Leu Pro Leu 
485 490 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (synthetic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
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CGTCCTGGAT CCATCGAGGG TAGAATCCAG CGTACTCCAA AG 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (synthetic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
GCGAAGCTTG ATCACATGTC TCG 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (synthetic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
CGTCCTGGAT CCATCGAGGG TAGAATCCAG AAAACCCCTC AAAT 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (synthetic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
GCGAAGCTTA CATGTCTCGA TC 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (synthetic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
CCTGGATCCA TCGAGGGTAG GTTCCCAACC ATTCCCTTAT 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (synthetic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
CCGAAGCTTA GAAGCCACAG CTGCCC 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (synthetic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
CGTCCTGGAT CCATCGAGGG TAGGTACTCG CGGGAGAAG 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (synthetic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10 
CGACCGAAGC TTCAGAGTTC GTTGTG 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (synthetic) 
Cxi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
CGTCCTGGAT CCATCGAGGG TAGGGCTATC GACGCCCCTA AG 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (synthetic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
CGACCGAAGC TTATCGGCAG TGGGGCCCCT 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (synthetic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
CGACCGAAGC TTAGGCCTTG CAGGAGCGG 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (synthetic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
CGACCGAAGC TTACTTCTTG CATGACTTCC CG 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (synthetic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
CGTCCTGGAT CCATCiAGGG TAGGGGCACC AACAAATGCC GG 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (synthetic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
CGACCGAAGC TTAGTCCAGG CTGCGGCAG 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 41 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (synthetic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
CGTCCTGGAT CCATCGAGGG TAGGGTGCCT CCACCCCAGT G 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (synthetic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
CGACCGAAGC TTACTGGTCG CAGAGCTCG 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 46 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA (synthetic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
CCTTGATCAA TCGAGGGTAG GGGTGGTCAG TGCTCTCTGA ATAACG 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (synthetic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
CGCAAGCTTA CTTAAACTCA TAGCAGGTG 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (synthetic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
CGTCCTGGAT CCATCGAGGG TAGGGCGGTG AATTCCTCTT GCCG 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (synthetic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
CGACCGAAGC TTAGATGTGG CAGCCACGCT 



(2) INFORMATION FOR SEQ ID NO: 23: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (synthetic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
CGTCCTGGAT CCATCGAGGG TAGGGTGTCC AACTGCACGG CT 



(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (synthetic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
CGACCGAAGC TTAGATGCTG CAGTCCTCCT 



(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 4 7 base pairs 
(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA (synthetic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
CGTCCTGGAT CCATCGAGGG TAGGAGTAAA TACAAAGATG GAGACCA 



(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (synthetic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
CGACCGAAGC TTACCAGGTG GCAGGGGCTT 
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(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPF: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (synthetic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
CTGCCTGGAT CCATCGAGGG TAGGAAAGTG TATCTCTCAT CAGAGTGCAA GACTGGGAAT GG 62 



(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (synthetic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
CGACCGAAGC TTATTCACAC TCAAGAATGT CGC 33 



(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 41 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (synthetic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
CTGCCTGGAT CCATCGAGGG TAGGGTCCAG GACTGCTACC AT 42 



(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (synthetic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 



CGACCGAAGC TTACGCTTCT GTTCCTGAGC A 



31 
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(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (synthetic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31 
CCTGGATCCA TCGAGGGTAG GGTCTACCTC CAGACATCCT 



(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (synthetic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32 
CCGAAGCTTC AAGCATTTCC AAGATC 



(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (synthetic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33 
CCTGGATCCA TCGAGGGTAG GGGCGAGCCA CCAACCCAG 



(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (synthetic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34 
CCGAAGCTTA CACGATCCCG AACTG 



(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (synthetic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35 
CCGAGATCTA TCGAGGGTAG GCAGGTCAAA CTGCAGCA 



(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (synthetic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36 
GCCAAGCTTA ATTCAGATCC TCTTCTGAG 



(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37 

Gly Ser lie Glu Gly Arg 
1 5 



(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38 

He Glu Gly Arcr 
■ 1 



(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39 

Tyr Trp Thr Asp 
1 



(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40 

He Gin Gly Arg 
1 



(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 

Ala Glu Gly Arg 
1 
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(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42 

Ala Gin Gly Arg 



(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 4 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43 
lie Cys Gly Arg 



(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44 
Ala Cys Gly Arg 



(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 

lie Met Gly Arg 
1 



(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 

Ala Met Gly Arg 
1 



(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
(D) TOPOLOGY: linear. 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 

His His His His His His 
1 5 



(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 

Met Gly Ser His His His His His His Gly Ser lie Glu Gly Arg 
1 5 10 15 



(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 119 amino acids 
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(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 

Met Ser Arg Ser Val Ala Leu Ala Val Leu Ala Leu Leu Ser Leu Ser 
15 10 15 

Gly Leu Glu Ala lie Gin Arg Thr Pro Lys lie Gin Val Tyr Ser Arg 
20 25 30 

His Pro Ala Glu Asn Gly Lys Ser Asn Phe Leu Asn Cys Tyr Val Ser 
35 40 45 

Gly Phe His Pro Ser Asp lie Glu Val Asp Leu Leu Lys Asn Gly Glu 
50 55 60 

Arg lie Glu Lys Val Glu His Ser Asp Leu Ser Phe Ser Lys Asp Trp 
65 70 75 80 

Ser Phe Tyr Leu Leu Tyr Tyr Thr Glu Phe Thr Pro Thr Glu Lys Asp 
85 90 95 

Glu Tyr Ala Cys Arg Val Asn His Val Thr Leu Ser Gin Pro Lys lie 
100 105 110 

Val Lys Trp Asp Arg Asp Met 
115 



(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 119 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 

Met Ala Arg Ser Val Thr Leu Val Phe Leu Val Leu Val Ser Leu Thr 
15 10 15 

Gly Leu Tyr Ala lie Gin Lys Thr Pro Gin He Gin Val Tyr Ser Arg 
20 25 30 

His Pro Pro Glu Asn Gly Lys Pro Asn He Leu Asn Cys Tyr Val Thr 
35 40 45 



Gin Phe His Pro Pro His He Glu He Gin Met Leu Lys Asn Gly Lys 
50 55 60 
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Lys lie Pro Lys Val Glu Met Ser Asp Met Ser Phe Ser Lys Asp Trp 
65 70 75 80 

Ser Phe Tyr lie Leu Ala His Thr Glu Phe Thr Pro Thr Glu Thr Asp 
85 90 95 

Thr Tyr Ala Cys Arg Val Lys His Asp Ser Met Ala Glu Pro Lys Thr 
100 105 110 

Val Tyr Trp Asp Arg Asp Met 
115 



(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 217 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 

Met Ala Thr Gly Ser Arg Thr Ser Leu Leu Leu Ala Phe Gly Leu Leu 
15 10 15 

Cys Leu Pro Trp Leu Gin Glu Gly Ser Ala Phe Pro Thr lie Pro Leu 
20 25 30 

Ser Arg Leu Phe Asp Asn Ala Ser Leu Arg Ala His Arg Leu His Gin 
35 40 45 

Leu Ala Phe Asp Thr Tyr Gin Glu Phe Glu Glu Ala Tyr lie Pro Lys 
50 55 60 

Glu Gin Lys Tyr Ser Phe Leu Gin Asn Pro Gin Thr Ser Leu Cys Phe 
65 70 75 80 

Ser Glu Ser lie Pro Thr Pro Ser Asn Arg Glu Glu Thr Gin Gin Lys 
85 90 95 

Ser Asn Leu Glu Leu Leu Arg lie Ser Leu Leu Leu lie Gin Ser Trp 
100 105 110 

Leu Glu Pro Val Gin Phe Leu Arg Ser Val Phe Ala Asn Ser Leu Val 
115 120 125 

Tyr Gly Ala Ser Asp Ser Asn Val Tyr Asp Leu Leu Lys Asp Leu Glu 
130 135 140 

Glu Gly lie Gin Thr Leu Met Gly Arg Leu Glu Asp Gly Ser Pro Arg 
145 150 155 160 



Thr Gly Gin lie Phe Lys Gin Thr Tyr Ser Lys Phe Asp Thr Asn Ser 
165 170 175 
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His Asn Asp Asp Ala Leu Leu Lys Asn Tyr Gly Leu Leu Tyr Cys Phe 
180 185 190 

Arg Lys Asp Met Asp Lys Val Glu Thr Phe Leu Arg lie Val Gin Cys 
195 200 205 

Arg Ser Val Glu Gly Ser Cys Gly Phe 



(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4544 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 

Met Leu Thr Pro Pro Leu Leu Leu Leu Leu Pro Leu Leu Ser Ala Leu 
15 10 is 

Val Ala Ala Ala lie Asp Ala Pro Lys Thr Cys Ser Pro Lys Gin Phe 
20 25 30 

Ala Cys Arg Asp Gin lie Thr Cys He Ser Lys Gly Trp Arg Cys Asp 
35 40 45 

Gly Glu Arg Asp Cys Pro Asp Gly Ser Asp Glu Ala Pro Glu He Cys 
50 55 60 

Pro Gin Ser Lys Ala Gin Arg Cys Gin Pro Asn Glu His Asn Cys Leu 
65 70 75 80 

Gly Thr Glu Leu Cys Val Pro Met Ser Arg Leu Cys Asn Gly Val Gin 
85 90 ;■ 95 

Asp Cys Met Asp Gly Ser Asp Glu Gly Pro His Cys Arg Glu Leu Gin 
100 105 110 

Gly Asn Cys Ser Arg Leu Gly Cys Gin His His Cys Val Pro Thr Leu 
115 120 125 

Asp Gly Pro Thr Cys Tyr Cys Asn Ser Ser Phe Gin Leu Gin Ala Asp 
130 135 140 

Gly Lys Thr Cys Lys Asp Phe Asp Glu Cys Ser Val Tyr Gly Thr Cys 
145 150 155 160 

Ser Gin Leu Cys Thr Asn Thr Asp Gly Ser Phe He Cys Gly Cys Val 



210 



215 



165 



170 



175 



Glu Gly Tyr Leu Leu Gin Pro Asp Asn Arg Ser Cys Lys Ala Lys Asn 
180 185 190 
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Glu Pro Val Asp Arg Pro Pro Val Leu Leu lie Ala Asn Ser Gin Asn 
195 200 205 

lie Leu Ala Thr Tyx Leu Ser Gly Ala Gin Val Ser Thr lie Thr Pro 
210 215 220 

Thr Ser Thr Arg Gin i'hr Thr Ala Met Asp Phe Ser Tyr Ala Asn Glu 
225 230 235 240 

Thr Val Cys Trp Val His Val Gly Asp Ser Ala Ala Gin Thr Gin Leu 
245 250 255 

Lys Cys Ala Arg Met Pro Gly Leu Lys Gly Phe Val Asp Glu His Thr 
260 265 270 

lie Asn lie Ser Leu Ser Leu His His Val Glu Gin Met Ala lie Asp 
275 280 285 

Trp Leu Thr Gly Asn Phe Tyr Phe Val Asp Asp lie Asp Asp Arg lie 
290 295 300 

Phe Val Cys Asn Arg Asn Gly Asp Thr Cys Val Thr Leu Leu Asp Leu * 
305 310 315 / 320 

Glu Leu Tyr Asn Pro Lys Gly lie Ala Leu Asp Pro Ala Met Gly Lys 
325 330 335 

Val Phe Phe Thr Asp Tyr Gly Gin lie Pro Lys Val Glu Arg Cys Asp 
340 345 350 

Met Asp Gly Gin Asn Arg Thr Lys Leu Val Asp Ser Lys lie Val Phe 
355 360 365 

Pro His Gly He Thr Leu Asp Leu Val Ser Arg Leu Val Tyr Trp Ala 
370 375 380 

Asp Ala Tyr Leu Asp Tyr He Glu Val Val Asp Tyr Glu Gly Lys Gly 
385 390 395 400 

Arg Gin Thr He He Gin Gly He Leu He Glu His Leu Tyr Gly Leu 
405 410 415 

Thr Val Phe Glu Asn Tyr Leu Tyr Ala Thr Asn Ser Asp Asn Ala Asn 
420 425 430 

Ala Gin Gin Lys Thr Ser Val He Arg Val Asn Arg Phe Asn Ser Thr 
435 440 445 

Glu Tyr Gin Val Val Thr Arg Val Asp Lys Gly Gly Ala Leu His He 
450 455 460 



Tyr His Gin Arg Arg Gin Pro Arg Val Arg Ser His Ala Cys Glu Asn 
465 470 475 480 



Asp Gin Tyr Gly Lys Pro Gly Gly Cys Ser Asp He Cys Leu Leu Ala 
485 490 495 
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Asn Ser His Lys Ala Arg Thr Cys Arg Cys Arg Ser Gly Phe Ser Leu 
500 505 510 

Gly Ser Asp Gly Lys Ser Cys Lys Lys Pro Glu His Glu Leu Phe Leu 
515 520 525 

Val Tyr Gly Lys Gly Arg Pro Gly lie lie Arg Gly Met Asp Met Gly 
530 535 540 

Ala Lys Val Pro Asp Glu His Met He Pro He Glu Asn Leu Met Asn 
545 550 555 560 

Pro Arg Ala Leu Asp Phe His Ala Glu Thr Gly Phe He Tyr Phe Ala 
565 570 575 

Asp Thr Thr Ser Tyr Leu He Gly Arg Gin Lys He Asp Gly Thr Glu 
580 585 590 

Arg Glu Thr He Leu Lys Asp Gly He His Asn Val Glu Gly Val Ala 
595 600 605 

Val Asp Trp Met Gly Asp Asn Leu Tyr Trp Thr Asp Asp Gly Pro Lys 
610 615 620 

Lys Thr He Ser Val Ala Arg Leu Glu Lys Ala Ala Gin Thr Arg Lys 
625 630 635 640 

Thr Leu He Glu Gly Lys Met Thr His Pro Arg Ala He Val Val Asp 
645 650 655 

Pro Leu Asn Gly Trp Met Tyr Trp Thr Asp Trp Glu Glu Asp Pro Lys 
660 665 670 

Asp Ser Arg Arg Gly Arg Leu Glu Arg Ala Trp Met Asp Gly Ser His 
675 680 685 

Arg Asp He Phe Val Thr Ser Lys Thr Val Leu Trp Pro Asn Gly Leu 
690 695 700 

Ser Leu Asp He Pro Ala Gly Arg Leu Tyr Trp Val Asp Ala Phe Tyr 
705 710 715 720 

Asp Arg He Glu Thr He Leu Leu Asn Gly Thr Asp Arg Lys He Val 
725 730 735 

Tyr Glu Gly Pro Glu Leu Asn His Ala Phe Gly Leu Cys His His Gly 
740 745 750 

Asn Tyr Leu Phe Trp Thr Glu Tyr Arg Ser Gly Ser Val Tyr Arg Leu 
755 760 765 

Glu Arg Gly Val Gly Gly Ala Pro Pro Thr Val Thr Leu Leu Arg Ser 
770 775 780 



Glu Arg Pro Pro He Phe Glu He Arg Met Tyr Asp Ala Gin Gin Gin 
785 790 795 800 
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Gin Val Gly Thr Asn Lys Cys Arg Val Asn Asn Gly Gly Cys Ser Ser 
805 810 815 

Leu Cys Leu Ala Thr Pro Gly Ser Arg Gin Cys Ala Cys Ala Glu Asp 
820 825 830 

Gin Val Leu Asp Ala Asp Gly Val Thr Cys Leu Ala Asn Pro Ser Tyr 
835 840 845 

Val Pro Pro Pro Gin Cys Gin Pro Gly Glu Phe Ala Cys Ala Asn Ser 
850 855 860 

Arg Cys lie Gin Glu Arg Trp Lys Cys Asp Gly Asp Asn Asp Cys Leu 
865 870 875 880 

Asp Asn Ser Asp Glu Ala Pro Ala Leu Cys His Gin His Thr Cys Pro 
885 890 895 

Ser Asp Arg Phe Lys Cys Glu Asn Asn Arg Cys lie Pro Asn Arg Trp 
900 905 910 

Leu Cys Asp Gly Asp Asn Asp Cys Gly Asn Ser Glu Asp Glu Ser Asn 
915 920 925 

Ala Thr Cys Ser Ala Arg Thr Cys Pro Pro Asn Gin Phe Ser Cys Ala 
930 935 940 

Ser Gly Arg Cys lie Pro lie Ser Trp Thr Cys Asp Leu Asp Asp Asp 
945 950 955 960 

Cys Gly Asp Arg Ser Asp Glu Ser Ala Ser Cys Ala Tyr Pro Thr Cys 
965 970 975 

Phe Pro Leu Thr Gin Phe Thr Cys Asn Asn Gly Arg Cys He Asn He 
980 985 990 

Asn Trp Arg Cys Asp Asn Asp Asn Asp Cys Gly Asp Asn Ser Asp Glu 
995 1000 1005 

Ala Gly Cys Ser His Ser Cys Ser Ser Thr Gin Phe Lys Cys Asn Ser 
1010 1015 1020 

Gly Arg Cys He Pro Glu His Trp Thr Cys Asp Gly Asp Asn Asp Cys 
1025 1030 1035 1040 

Gly Asp Tyr Ser Asp Glu Thr His Ala Asn Cys Thr Asn Gin Ala Thr 
1045 1050 1055 

Arg Pro Pro Gly Gly Cys His Thr Asp Glu Phe Gin Cys Arg Leu Asp 
1060 1065 1070 

Gly Leu Cys He Pro Leu Arg Trp Arg Cys Asp Gly Asp Thr Asp Cys 
1075 1080 1085 



Met: Asp Ser Ser Asp Glu Lys Ser Cys Glu Gly Val Thr His Val Cys 
1090 1095 1100 
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Asp Pro Ser Val Lys Phe Gly Cys Lys Asp Ser Ala Arg Cys lie Ser 
1105 1110 1115 1120 

Lys Ala Trp Val Cys Asp Gly Asp Asn Asp Cys Glu Asp Asn Ser Asp 
1125 1130 1135 

Glu Glu Asn Cys Glu S^r Leu Ala Cys Arg Pro Pro Ser His Pro Cys 
1140 1145 1150 

Ala Asn Asn Thr Ser Val Cys Leu Pro Pro Asp Lys Leu Cys Asp Gly 
1155 1160 1165 

Asn Asp Asp Cys Gly Asp Gly Ser Asp Glu Gly Glu Leu Cys Asp Gin 
1170 1175 1180 

Cys Ser Leu Asn Asn Gly Gly Cys Ser His Asn Cys Ser Val Ala Pro 
1185 1190 1195 1200 

Gly Glu Gly lie Val Cys Ser Cys Pro Leu Gly Met Glu Leu Gly Pro 
1205 1210 1215 

Asp Asn His Thr Cys Gin lie Gin Ser Tyr Cys Ala Lys His Leu Lys 
1220 1225 1230 

Cys Ser Gin Lys Cys Asp Gin Asn Lys Phe Ser Val Lys Cys Ser Cys 
1235 1240 1245 

Tyr Glu Gly Trp Val Leu Glu Pro Asp Gly Glu Ser Cys Arg Ser Leu 
1250 1255 1260 

Asp Pro Phe Lys Pro Phe lie lie Phe Ser Asn Arg His Glu He Arg 
1265 1270 1275 1280 

Arg He Asp Leu His Lys Gly Asp Tyr Ser Val Leu Val Pro Gly Leu 
1285 1290 1295 

Arg Asn Thr He Ala Leu Asp Phe His Leu Ser Gin Ser Ala Leu Tyr 
1300 1305 1310 

Trp Thr Asp Val Val Glu Asp Lys He Tyr Arg Gly Lys Leu Leu Asp 
1315 1320 1325 

Asn Gly Ala Leu Thr Ser Phe Glu Val Val He Gin Tyr Gly Leu Ala 
1330 1335 1340 

Thr Pro Glu Gly Leu Ala Val Asp Trp He Ala Gly Asn He Tyr Trp 
1345 * 1350 1355 1360 

Val Glu Ser Asn Leu Asp Gin He Glu Val Ala Lys Leu Asp Gly Thr 
1365 1370 1375 

Leu Arg Thr Thr Leu Leu Ala Gly Asp He Glu His Pro Arg Ala He 
1380 1385 1390 



Ala Leu Asp Pro Arg Asp Gly He Leu Phe Trp Thr Asp Trp Asp Ala 
1395 1400 1405 
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Ser Leu Pro Arg lie Glu Ala Ala Ser Met Ser Gly Ala Gly Arg Arg 
1410 1415 1420 

Thr Val His Arg Glu Thr Gly Ser Gly Gly Trp Pro Asn Gly Leu Thr 
1425 1430 1435 1440 

Val Asp Tyr Leu Glu Lys Arg lie Leu Trp lie Asp Ala Arg Ser Asp 
1445 1450 1455 

Ala lie Tyr Ser Ala Arg Tyr Asp Gly Ser Gly His Met Glu Val Leu 
1460 1465 1470 - 

Arg Gly His Glu Phe Leu Ser His Pro Phe Ala Val Thr Leu Tyr Gly 
1475 1480 1485 

Gly Glu Val Tyr Trp Thr Asp Trp Arg Thr Asn Thr Leu Ala Lys Ala 
1490 1495 1500 

Asn Lys Trp Thr Gly His Asn Val Thr Val Val Gin Arg Thr Asn Thr 
1505 1510 1515 1520 

Gin Pro Phe Asp Leu Gin Val Tyr His Pro Ser Arg Gin Pro Met Ala 
1525 1530 1535 

Pro Asn Pro Cys Glu Ala Asn Gly Gly Gin Gly Pro Cys Ser His Leu 
1540 1545 1550 

Cys Leu lie Asn Tyr Asn Arg Thr Val Ser Cys Ala Cys Pro His Leu 
1555 1560 1565 

Met Lys Leu His Lys Asp Asn Thr Thr Cys Tyr Glu Phe Lys Lys Phe 
1570 1575 1580 

Leu Leu Tyr Ala Arg Gin Met Glu lie Arg Gly Val Asp Leu Asp Ala 
1585 1590 1595 1600 

Pro Tyr Tyr Asn Tyr lie lie Ser Phe Thr Val Pro Asp lie Asp Asn 
1605 1610 1615 

Val Thr Val Leu Asp Tyr Asp Ala Arg Glu Gin Arg Val Tyr Trp Ser 
1620 1625 1630 

Asp Val Arg Thr Gin Ala lie Lys Arg Ala Phe lie Asn Gly Thr Gly 
1635 1640 1645 

Val Glu Thr Val Val Ser Ala Asp Leu Pro Asn Ala His Gly Leu Ala 
1650 1655 1660 

Val Asp Trp Val Ser Arg Asn Leu Phe Trp Thr Ser Tyr Asp Thr Asn 
1665 1670 1675 1680 

Lys Lys Gin lie Asn Val Ala Arg Leu Asp Gly Ser Phe Lys Asn Ala 
1685 1690 1695 



Val Val Gin Gly Leu Glu Gin Pro His Gly Leu Val Val His Pro Leu 
1700 1705 1710 
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Arg Gly Lys Leu Tyr Trp Thr Asp Gly Asp Asn lie Ser Met Ala Asn 
1715 1720 1725 

Met Asp Gly Ser Asn Arg Thr Leu Leu Phe Ser Gly Gin Lys Gly Pro 
1730 1735 1740 

Val Gly Leu Ala lie Asp Phe Pro Glu Ser Lys Leu Tyr Trp lie Ser 
1745 1750 1755 1760 

Ser Gly Asn His Thr lie Asn Arg Cys Asn Leu Asp Gly Ser Gly Leu 
1765 1770 1775 

Glu Val lie Asp Ala Met Arg Ser Gin Leu Gly Lys Ala Thr Ala Leu 
1780 1785 1790 

Ala lie Met Gly Asp Lys Leu Trp Trp Ala Asp Gin Val Ser Glu Lys 
1795 1800 1805 

Met Gly Thr Cys Ser Lys Ala Asp Gly Ser Gly Ser Val Val Leu Arg 
1810 1815 1820 

Asn Ser Thr Thr Leu Val Met His Met Lys Val Tyr Asp Glu Ser lie 
1825 1830 1835 1840 

Gin Leu Asp His Lys Gly Thr Asn Pro Cys Ser Val Asn Asn Gly Asp 
1845 1850 1855 

Cys Ser Gin Leu Cys Leu Pro Thr Ser Glu Thr Thr Arg Ser Cys Met 
1860 1865 1870 

Cys Thr Ala Gly Tyr Ser Leu Arg Ser Gly Gin Gin Ala Cys Glu Gly 
1875 1880 1885 

Val Gly Ser Phe Leu Leu Tyr Ser Val His Glu Gly lie Arg Gly lie 
1890 1895 1900 

Pro Leu Asp Pro Asn Asp Lys Ser Asp Ala Leu Val Pro Val Ser Gly 
1905 1910 1915 1920 

Thr Ser Leu Ala Val Gly He Asp Phe His Ala Glu Asn Asp Thr He 
1925 1930 ~ 1935 

Tyr Trp Val Asp Met Gly Leu Ser Thr He Ser Arg Ala Lys Arg Asp 
1940 1945 1950 

Gin Thr Trp Arg Glu Asp Val Val Thr Asn Gly He Gly Arg Val Glu 
1955 1960 1965 

Gly He Ala Val Asp Trp He Ala Gly Asn He Tyr Trp Thr Asp Gin 
1970 1975 1980 

Gly Phe Asp Val He Glu Val Ala Arg Leu Asn Gly Ser Phe Arg Tyr 
1985 1990 1995 2000 



Val Val He Ser Gin Gly Leu Asp Lys Pro Arg Ala He Thr Val His 
2005 2010 2015 
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Pro Glu Lys Gly Tyr Leu Phe Trp Thr Glu Trp Gly Gin Tyr Pro Arg 
2020 2025 2030 

lie Glu Arg Ser Arg Leu Asp Gly Thr Glu Arg Val Val Leu Val Asn 
2035 2040 2045 

Val Ser He Ser Trp Pro Asn Gly He Ser Val Asp Tyr Gin Asp Gly 
2050 2055 2060 

Lys Leu Tyr Trp Cys Asp Ala Arg Thr Asp Lys He Glu Arg He Asp 
2065 2070 2075 2080 

Leu Glu Thr Gly Glu Asn Arg Glu Val Val Leu Ser Ser Asn Asn Met 
2085 2090 2095 

Asp Met Phe Ser Val Ser Val Phe Glu Asp Phe lie Tyr Trp Ser Asp 
2100 2105 2110 

Arg Thr His Ala Asn Gly Ser He Lys Arg Gly Ser Lys Asp Asn Ala 
2115 2120 2125 

Thr Asp Ser Val Pro Leu Arg Thr Gly He Gly Val Gin Leu Lys Asp 
2130 2135 2140 

He Lys Val Phe Asn Arg Asp Arg Gin Lys Gly Thr Asn Val Cys Ala 
2145 2150 2155 2160 

Val Ala Asn Gly Gly Cys Gin Gin Leu Cys Leu Tyr Arg Gly Arg Gly 
2165 2170 2175 

Gin Arg Ala Cys Ala Cys Ala His Gly Met Leu Ala Glu Asp Gly Ala 
2180 2185 2190 

Ser Cys Arg Glu Tyr Ala Gly Tyr Leu Leu Tyr Ser Glu Arg Thr He 
2195 2200 2205 

Leu Lys Ser He His Leu Ser Asp Glu Arg Asn Leu Asn Ala Pro Val 
2210 2215 2220 

Gin Pro Phe Glu Asp Pro Glu His Met Lys Asn Val lie Ala Leu Ala 
2225 2230 2235 2240 

Phe Asp Tyr Arg Ala Gly Thr Ser Pro Gly Thr Pro Asn Arg He Phe 
2245 2250 2255 

Phe Ser Asp He His Phe Gly Asn He Gin Gin He Asn Asp Asp Gly 
2260 2265 2270 

Ser Arg Arg He Thr He Val Glu Asn Val Gly Ser Val Glu Gly Leu 
2275 2280 2285 

Ala Tyr His Arg Gly Trp Asp Thr Leu Tyr Trp Thr Ser Tyr Thr Thr 
2290 2295 2300 



Ser Thr He Thr Arg His Thr Val Asp Gin Thr Arg Pro Gly Ala Phe 
2305 2310 2315 2320 
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Glu Arg Glu Thr Val lie Thr Met Ser Gly Asp Asp His Pro Arg Ala 
2325 2330 2335 

Phe Val Leu Asp Glu Cys Gin Asn Leu Met Phe Trp Thr Asn Trp Asn 
2340 2345 2350 

Olu Gin His Pro Ser lie Met Arg Ala Ala Leu Ser Gly Ala Asn Val 
2355 2360 2365 

Leu Thr Leu lie Glu Lys Asp He Arg Thr Pro Asn Gly Leu Ala He 
2370 2375 2380 

Asp His Arg Ala Glu Lys Leu Tyr Phe Ser Asp Ala Thr Leu Asp Lys 
2385 2390 2395 2400 

He Glu Arg Cys Glu Tyr Asp Gly Ser His Arg Tyr Val He Leu Lys 
2405 2410 2415 

Ser Glu Pro Val His Pro Phe Gly Leu Ala Val Tyr Gly Glu His He 
2420 2425 2430 

Phe Trp Thr Asp Trp Val Arg Arg Ala Val Gin Arg Ala Asn Lys His 
2435 2440 2445 

Val Gly Ser Asn Met Lys Leu Leu Arg Val Asp He Pro Gin Gin Pro 
2450 2455 2460 

Met Gly lie He Ala Val Ala Asn Asp Thr Asn Ser Cys Glu Leu Ser 
2465 2470 2475 2480 

Pro Cys Arg He Asn Asn Gly Gly Cys Gin Asp Leu Cys Leu Leu Thr 
2485 2490 2495 

His Gin Gly His Val Asn Cys Ser Cys Arg Gly Gly Arg He Leu Gin 
2500 2505 2510 

Asp Asp Leu Thr Cys Arg Ala Val Asn Ser Ser Cys Arg Ala Gin Asp 
2515 2520 2525 

Glu Phe Glu Cys Ala Asn Gly Glu Cys He Asn Phe Ser Leu Thr Cys 
2530 2535 2540 

Asp Gly Val Pro His Cys Lys Asp Lys Ser Asp Glu Lys Pro Ser Tyr 
2545 2550 2555 2560 

Cys Asn Ser Arg Arg Cys Lys Lys Thr Phe Arg Gin Cys Ser Asn Gly 
2565 2570 2575 

Arg Cys Val Ser Asn Met Leu Trp Cys Asn Gly Ala Asp Asp Cys Gly 
2580 2585 2590 

Asp Gly Ser Asp Glu He Pro Cys Asn Lys Thr Ala Cys Gly Val Gly 
2595 2600 2605 



Glu Phe Arg Cys Arg Asp Gly Thr Cys He Gly Asn Ser Ser Arg Cys 
2610 2615 2620 
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Asn Gin Phe Val Asp Cys Glu Asp Ala Ser Asp Glu Met Asn Cys Ser 
2625 2630 2635 2640 

Ala Thr Asp Cys Ser Ser Tyr Phe Arg Leu Gly Val Lys Gly Val Leu 
2645 2650 2655 

Phe Gin Pro Cys Glu Arg Thr Ser Leu Cys Tyr Ala tro Ser Trp Val 
2660 2665 2670 

Cys Asp Gly Ala Asn Asp Cys Gly Asp Tyr Ser Asp Glu Arg Asp Cys 
2675 2680 2685 

Pro Gly Val Lys Arg Pro Arg Cys Pro Leu Asn Tyr Phe Ala Cys Pro 
2690 2695 2700 

Ser Gly Arg Cys lie Pro Met Ser Trp Thr Cys Asp Lys Glu Asp Asp 
2705 2710 2715 2720 

Cys Glu His Gly Glu Asp Glu Thr His Cys Asn Lys Phe Cys Ser Glu 
2725 2730 2735 

Ala Gin Phe Glu Cys Gin Asn His Arg Cys lie Ser Lys Gin Trp Leu 
2740 2745 2750 

Cys Asp Gly Ser Asp Asp Cys Gly Asp Gly Ser Asp Glu Ala Ala His 
2755 2760 2765 

Cys Glu Gly Lys Thr Cys Gly Pro Ser Ser Phe Ser Cys Pro Gly Thr 
2770 2775 2780 

His Val Cys Val Pro Glu Arg Trp Leu Cys Asp Gly Asp Lys Asp Cys 
2785 2790 2795 2800 

Ala Asp Gly Ala Asp Glu Ser lie Ala Ala Gly Cys Leu Tyr Asn Ser 
2805 2810 2815 

Thr Cys Asp Asp Arg Glu Phe Met Cys Gin Asn Arg Gin Cys lie Pro 
2820 2825 2830 

Lys His Phe Val Cys Asp His Asp Arg Asp Cys Ala Asp Gly Ser Asp 
2835 2840 2845 

Glu Ser Pro Glu Cys Glu Tyr Pro Thr Cys Gly Pro Ser Glu Phe Arg 
2850 2855 2860 

Cys Ala Asn Gly Arg Cys Leu Ser Ser Arg Gin Trp Glu Cys Asp Gly 
2865 2870 2875 2880 

Glu Asn Asp Cys His Asp Gin Ser Asp Glu Ala Pro Lys Asn Pro His 
2885 2890 2895 

Cys Thr Ser Pro Glu His Lys Cys Asn Ala Ser Ser Gin Phe Leu Cys 
2900 2905 2910 



Ser Ser Gly Arg Cys Val Ala Glu Ala Leu Leu Cys Asn Gly Gin Asp 
2915 2920 2925 
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Asp Cys Gly Asp Ser Ser Asp Glu Arg Gly Cys His He Asn Glu Cys 
2930 2935 2940 

Leu Ser Arg Lys Leu Ser Gly Cys Ser Gin Asp Cys Glu Asp Leu Lys 
2945 2950 2955 2960 

He Gly Phe Lys Cys Arg C>s Arg Pro Gly Phe Arg Leu Lys Asp Asp 
2965 2970 2975 

Gly Arg Thr Cys Ala Asp Val Asp Glu Cys Ser Thr Thr Phe Pro Cys 
2980 2985 2990 

Ser Gin Arg Cys He Asn Thr His Gly Ser Tyr Lys Cys Leu Cys Val 
2995 3000 3005 

Glu Gly Tyr Ala Pro Arg Gly Gly Asp Pro His Ser Cys Lys Ala Val 
3010 3015 3020 

Thr Asp Glu Glu Pro Phe Leu He Phe Ala Asn Arg Tyr Tyr Leu Arg 
3025 3030 3035 3040 

Lys Leu Asn Leu Asp Gly Ser Asn Tyr Thr Leu Leu Lys Gin Gly Leu 
3045 3050 3055 

Asn Asn Ala Val Ala Leu Asp Phe Asp Tyr Arg Glu Gin Met He Tyr 
3060 3065 3070 

Trp Thr Asp Val Thr Thr Gin Gly Ser Met He Arg Arg Met His Leu 
3075 3080 3085 



Asn Gly Ser Asn Val Gin Val Leu 
3090 3095 

Asp Gly Leu Ala Val Asp Trp' Val 
3105 3110 

Lys Gly Arg Asp Thr He 
. 3125 

Thr Val Leu Val Ser 
3140 



His Arg Thr Gly Leu Ser Asn Pro 
3100 

Gly Gly Asn Leu Tyr Trp Cys Asp 
3115 3120 

Asn Gly Ala Tyr Arg 
3135 



Glu Val Ser Lys Leu 
3130 

Ser Gly Leu Arg Glu Pro Arg Ala Leu Val Val 
3145 3150 



Asp Val Gin Asn Gly Tyr Leu Tyr Trp Thr Asp Trp Gly Asp His Ser 
3155 3160 3165 

Leu He Gly Arg He Gly Met Asp Gly Ser Ser Arg Ser Val He Val 
3170 3175 3180 

Asp Thr Lys He Thr Trp Pro Asn Gly Leu Thr Leu Asp Tyr Val Thr 
3185 3190 3195 3200 

Glu Arg He Tyr Trp Ala Asp Ala Arg Glu Asp Tyr He Glu Phe Ala 
3205 3210 3215 



Ser Leu Asp 



Gly Ser Asn Arg His Val Val Leu Ser Gin Asp He Pro 
3220 3225 3230 
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His lie Phe Ala Leu Thr Leu Phe Glu Asp Tyr Val Tyx Trp Thr Asp 
3235 3240 3245 

Trp Glu Thr Lys Ser lie Asn Arg Ala His Lys Thr Thr Gly Thr Asn 
3250 3255 3260 

Lys Thr Leu Leu lie Ser Thr Leu His Arg Pro Met Asp Leu His Val 
3265 3270 3275 3280 

Phe His Ala Leu Arg Gin Pro Asp Val Pro Asn His Pro Cys Lys Val 
3285 3290 3295 

Asn Asn Gly Gly Cys Ser Asn Leu Cys Leu Leu Ser Pro Gly Gly Gly 
3300 3305 3310 



His Lys Cys Ala Cys Pro Thr Asn Phe Tyr Leu Gly Ser Asp Gly Arg 
3315 3320 3325 

Thr Cys Val Ser Asn Cys Thr Ala Ser Gin Phe Val Cys Lys Asn Asp 
3330 3335 3340 

Lys Cys lie Pro Phe Trp Trp Lys Cys Asp Thr Glu Asp Asp Cys Gly 
3345 3350 3355 3360 

Asp His Ser Asp Glu Pro Pro Asp Cys Pro Glu Phe Lys Cys Arg Pro 
3365 3370 3375 

Gly Gin Phe Gin Cys Ser Thr Gly lie Cys Thr Asn Pro Ala Phe lie 
3380 3385 3390 

Cys Asp Gly Asp Asn Asp Cys Gin Asp Asn Ser Asp Glu Ala Asn Cys 
3395 3400 3405 

Asp He His Val Cys Leu Pro Ser Gin Phe Lys Cys Thr Asn Thr Asn 
3410 3415 3420 

Arg Cys He Pro Gly He Phe Arg Cys Asn Gly Gin Asp Asn Cys Gly 
3425 3430 3435 3440 

Asp Gly Glu Asp Glu Arg Asp Cys Pro Glu Val Thr Cys Ala Pro Asn 
3445 3450 3455 

Gin Phe Gin Cys Ser He Thr Lys Arg Cys He Pro Arg Val Trp Val 
3460 3465 3470 

Cys Asp Arg Asp Asn Asp Cys Val Asp Gly Ser Asp Glu Pro Ala Asn 
3475 3480 3485 

Cys Thr Gin Met Thr Cys Gly Val Asp Glu Phe Arg Cys Lys Asp Ser 
3490 3495 3500 

Gly Arg Cys lie Pro Ala Arg Trp Lys Cys Asp Gly Glu Asp Asp Cys 
3505 3510 3515 3520 



Gly Asp Gly Ser Asp Glu Pro Lys Glu Glu Cys Asp Glu Arg Thr Cys 
3525 3530 3535 
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Glu Pro Tyr Gin Phe Arg Cys Lys Asn Asn Arg Cys Val Pro Gly Arg 

3540 3545 3550 

Trp Gin Cys Asp Tyr Asp Asn Asp Cys Gly Asp Asn Ser Asp Glu Glu 
3555 3560 356S 

Ser Cys Thr Pro Arg Pro Cys Ser Glu Ser Glu Phe Ser Cys Ala Asn 
3570 3575 3580 

Gly Arg Cys lie Ala Gly Arg Trp Lys Cys Asp Gly Asp His Asp Cys 
3585 3590 3595 3600 

Ala Asp Gly Ser Asp Glu Lys Asp Cys Thr Pro Arg Cys Asp Met Asp 
3605 3610 3615 

Gin Phe Gin Cys Lys Ser Gly His Cys He Pro Leu Arg Trp Arg Cys 
3620 3625 3630 

Asp Ala Asp Ala Asp Cys Met Asp Gly Ser Asp Glu Glu Ala Cys Gly 
3635 3640 3645 

Thr Gly Val Arg Thr Cys Pro Leu Asp Glu Phe Gin Cys Asn Asn Thr 
3650 3655 3660 

Leu Cys Lys Pro Leu Ala Trp Lys Cys Asp Gly Glu Asp Asp Cys Gly 
3665 3670 3675 3680 

Asp Asn Ser Asp Glu Asn Pro Glu Glu Cys Ala Arg Phe Val Cys Pro 
3685 3690 3695 

Pro Asn Arg Pro Phe Arg Cys Lys Asn Asp Arg Val Cys Leu Trp He 
3700 3705 3710 

Gly Arg Gin Cys Asp Gly Thr Asp Asn Cys Gly Asp Gly Thr Asp Glu 
3715 3720 3725 

Glu Asp Cys Glu Pro Pro Thr Ala His Thr Thr His Cys Lys Asp Lys 
3730 3735 3740 

Lys Glu Phe Leu Cys Arg Asn Gin Arg Cys Leu Ser Ser Ser Leu Arg 
3745 3750 3755 3760 

Cys Asn Met Phe Asp Asp Cys Gly Asp Gly Ser Asp Glu Glu Asp Cys 
3765 3770 3775 

Ser He Asp Pro Lys Leu Thr Ser Cys Ala Thr Asn Ala Ser He Cys 
3780 3785 3790 

Gly Asp Glu Ala Arg Cys Val Arg Thr Glu Lys Ala Ala Tyr Cys Ala 
3795 3800 3805 

Cys Arg Ser Gly Phe His Thr Val Pro Gly Gin Pro Gly Cys Gin Asp 
3810 3815 3820 



He Asn Glu Cys Leu Arg Phe Gly Thr Cys Ser Gin Leu Cys Asn Asn 
3825 3830 3835 3840 
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Thr Lys Gly Gly His Leu Cys Ser Cys Ala Arg Asn Phe Met Lys Thr 
3845 3850 3855 

His Asn Thr Cys Lys Ala Glu Gly Ser Glu Tyr Gin Val Leu Tyr He 
3860 3865 3870 



Ala Asp Asp Asn Glu He Arg Ser Leu Phe Pro Gly His Pro His Ser 
3875 3880 3885 

Ala Tyr Glu Gin Ala Phe Gin Gly Asp Glu Ser Val Arg He Asp Ala 
3890 3895 3900 

Met Asp Val His Val Lys Ala Gly Arg Val Tyr Trp Thr Asn Trp His 
3905 3910 3915 3920 

Thr Gly Thr He Ser Tyr Arg Ser Leu Pro Pro Ala Ala Pro Pro Thr 
3925 3930 3935 

Thr Ser Asn Arg His Arg Arg Gin He Asp Arg Gly Val Thr His Leu 
3940 3945 3950 

Asn He Ser Gly Leu Lys Met Pro Arg Gly He 711a He Asp Trp Val 
3955 3960 3965 

Ala Gly Asn Val Tyr Trp Thr Asp Ser Gly Arg Asp Val He Glu Val 
3970 3975 3980 

Ala Gin Met Lys Gly Glu Asn Arg Lys Thr Leu He Ser Gly Met He 
3985 3990 3995 4000 

Asp Glu Pro His Ala lie Val Val Asp Pro Leu Arg Gly Thr Met Tyr 
4005 4010 4015 



Trp Ser Asp Trp Gly Asn His Pro Lys He Glu Thr Ala Ala Met Asp 
4020 4025 4030 

Gly Thr Leu Arg Glu Thr Leu Val Gin Asp Asn He Gin Trp Pro Thr 
4035 4040 4045 

Gly Leu Ala Val Asp Tyr His Asn Glu Arg Leu Tyr Trp Ala Asp Ala 
4050 4055 4060 



Lys Leu Ser Val He Gly Ser He Arg Leu Asn Gly Thr Asp Pro He 
4065 4070 4075 4080 

Val Ala Ala Asp Ser Lys Arg Gly Leu Ser His Pro Phe Ser He Asp 
4085 4090 4095 



Val Phe Glu Asp Tyr He Tyr Gly Val Thr Tyr He Asn Asn Arg Val 
4100 4105 4110 

Phe Lys He His Lys Phe Gly His Ser Pro Leu Val Asn Leu Thr Gly 
4115 4120 4125 



Gly Leu Ser His Ala Ser Asp Val Val Leu Tyr His Gin His Lys Gin 
4130 4135 4140 
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Pro Glu Val Thr Asn Pro Cys Asp Arg Lys Lys Cys Glu Trp Leu Cys 
4145 4150 4155 4160 

Leu Leu Ser Pro Ser Gly Pro Val Cys Thr Cys Pro Asn Gly Lys Arg 
4165 4170 4175 

Leu Asp Asn Gly Thr Cys Val Pro Val Pro Ser Pro Thr Pro Pro Pro 
4180 4185 4190 

Asp Ala Pro Arg Pro Gly Thr Cys Asn Leu Gin Cys Phe Asn Gly Gly 
4195 4200 4205 

Ser Cys Phe Leu Asn Ala Arg Arg Gin Pro Lys Cys Arg Cys Gin Pro 
4210 4215 4220 

Arg Tyr Thr Gly Asp Lys Cys Glu Leu Asp Gin Cys Trp Glu His Cys 
4225 4230 4235 4240 

Arg Asn Gly Gly Thr Cys Ala Ala Ser Pro Ser Gly Met Pro Thr Cys 
4245 4250 4255 

Arg Cys Pro Thr Gly Phe Thr Gly Pro Lys Cys Thr Gin Gin Val Cys 
4260 4265 4270 

Ala Gly Tyr Cys Ala Asn Asn Ser Thr Cys Thr Val Asn Gin Gly Asn 
4275 4280 4285 

Gin Pro Gin Cys Arg Cys Leu Pro Gly Phe Leu Gly Asp Arg Cys Gin 
4290 4295 4300 

Tyr Arg Gin Cys Ser Gly Tyr Cys Glu Asn Phe Gly Thr Cys Gin Met 
4305 4310 4315 4320 

Ala Ala Asp Gly Ser Arg Gin Cys Arg Cys Thr Ala Tyr Phe Glu Gly 
4325 4330 4335 

Ser Arg Cys Glu Val Asn Lys Cys Ser Arg Cys Leu Glu Gly Ala Cys 
4340 4345 4350 

Val Val Asn Lys Gin Ser Gly Asp Val Thr Cys Asn Cys Thr Asp Gly 
4355 4360 4365 

Arg Val Ala Pro Ser Cys Leu Thr Cys Val Gly His Cys Ser Asn Gly 
4370 4375 4380 

Gly Ser Cys Thr Met Asn Ser Lys Met Met Pro Glu Cys Gin Cys Pro 
4385 4390 4395 4400 

Pro His Met Thr Gly Pro Arg Cys Glu Glu His Val Phe Ser Gin Gin 
4405 4410 4415 

Gin Pro Gly His lie Ala Ser lie Leu lie Pro Leu Leu Leu Leu Leu 
4420 4425 4430 



Leu Leu Val Leu Val Ala Gly Val Val Phe Trp Tyr Lys Arg Arg Val 
4435 4440 4445 
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Gin Gly Ala Lys Gly Phe Gin His Gin Arg Met Thr Asn Gly Ala Met 
4450 4455 4460 

Asn Val Glu lie Gly Asn Pro Thr Tyr Lys Met Tyr Glu Gly Gly Glu 
4465 4470 4475 4480 

Pro Asp Asp Val Gly Gly Leu Leu Asp Ala Asp Phe Ala Leu Asp Pro 
4485 4490 4495 

Asp Lys Pro Thr Asn Phe Thr Asn Pro Val Tyr Ala Thr Leu Tyr Met 
4500 4505 4510' 

Gly Gly His Gly Ser Arg His Ser Leu Ala Ser Thr Asp Glu Lys Arg 
4515 4520 4525 

Glu Leu Leu Gly Arg Gly Pro Glu Asp Glu lie Gly Asp Pro Leu Ala 
4530 4535 4540 



(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 487 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 

Met Ala Gly Leu Leu His Leu Val Leu Leu Ser Thr Ala Leu Gly Gly 
1 5 10 15 

Leu Leu Arg Pro Ala Gly Ser Val Phe Leu Pro Arg Asp Gin Ala His 
20 25 30 

Arg Val Leu Gin Arg Ala Arg Arg Ala Asn Ser Phe Leu Glu Glu Val 
35 40 45 . 

Lys Gin Gly Asn Leu Glu Arg Glu Cys Leu Glu Glu Ala Cys Ser Leu 
50 55 60 

Glu Glu Ala Arg Glu Val Phe Glu Asp Ala Glu Gin Thr Asp Glu Phe 
65 70 75 80 

Trp Ser Lys Tyr Lys Asp Gly Asp Gin Cys Glu Gly His Pro Cys Leu 
85 90 95 

Asn Gin Gly His Cys Lys Asp Gly He Gly Asp Tyr Thr Cys Thr Cys 
100 105 HO 

Ala Glu Gly Phe Glu Gly Lys Asn Cys Glu Phe Ser Thr Arg Glu He 
115 120 1 125 

Cys Ser Leu Asp Asn Gly Gly Cys Asp Gin Phe Cys Arg Glu Glu Arg 
130 135 140 
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Ser Glu Val Arg Cys Ser Cys Ala His Gly Tyr Val Leu Gly Asp Asp 
145 150 155 160 

Ser Lys Ser Cys Val Ser Thr Glu Arg Phe Pro Cys Gly Lys Phe Thr 
165 170 175 

Gin Gly Arg Ser Arg Arg Trp Aid lie His Thr Ser Glu Asp Ala Leu 
180 185 190 

Asp Ala Ser Glu Leu Glu His Tyr Asp Pro Ala Asp Leu Ser Pro Thr 
195 200 205 

Glu Ser Ser Leu Asp Leu Leu Gly Leu Asn Arg Thr Glu Pro Ser Ala 
210 215 220 

Gly Glu Asp Gly Ser Gin Val Val Arg lie Val Gly Gly Arg Asp Cys 
225 230 235 240 

Ala Glu Gly Glu Cys Pro Trp Gin Ala Leu Leu Val Asn Glu Glu Asn 
245 250 255 

Glu Gly Phe Cys Gly Gly Thr He Leu Asn Glu Phe Tyr Val Leu Thr 
260 265 270 

Ala Ala His Cys Leu His Gin Ala Lys Arg Phe Thr Val Arg Val Gly 
275 280 285 

Asp Arg Asn Thr Glu Gin Glu Glu Gly Asn Glu Met Ala His Glu Val 
290 295 300 

Glu Met Thr Val Lys His Ser Arg Phe Val Lys Glu Thr Tyr Asp Phe 
305 310 315 320 

Asp He Ala Val Leu Arg Leu Lys Thr Pro He Arg Phe Arg Arg Asn 
325 330 335 

Val Ala Pro Ala Cys Leu Pro Glu Lys Asp Trp Ala Glu Ala Thr Leu 
340 345 350 

Met Thr Gin Lys Thr Gly He Val Ser Gly Phe Gly Arg Thr His Glu 
355 360 365 

Lys Gly Arg Leu Ser Ser Thr Leu Lys Met Leu Glu Val Pro Tyr Val 
370 375 380 

Asp Arg Ser Thr Cys Lys Leu Ser Ser Ser Phe Thr He Thr Pro Asn 
385 390 395 400 

Met Phe Cys Ala Gly Tyr Asp Thr Gin Pro Glu Asp Ala Cys Gin Gly 
405 410 415 



Asp Ser Gly Gly Pro His Val Thr Arg Phe Lys Asp Thr Tyr Phe Val 
420 425 430 



Thr Gly He Val Ser Trp Gly Glu Gly Cys Ala Arg Lys Gly Lys Phe 
435 440 445 
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Gly Val Tyr Thr Lys Val Ser Asn Phe Leu Lys Trp lie Asp Lys lie 
450 455 460 

Met Lys Ala Arg Ala Gly Ala Ala Gly Ser Arg Gly His Ser Glu Ala 
465 470 475 480 

Pro Aid Thr Trp Thr Val Pro 
485 



(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 790 amino acids 

(B) TYPE : amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 

Glu Pro Leu Asp Asp Tyr Val Asn Thr Gin Gly Ala Ser Leu Phe Ser 
1 5 10 15 

Val Thr Lys Lys Gin Leu Gly Ala Gly Ser lie Glu Glu Cys Ala Ala 
20 25 30 

Lys Cys Glu Glu Asp Glu Glu Phe Thr Cys Arg Ala Phe Gin Tyr His 
35 40 45 

Ser Lys Glu Gin Gin Cys Val lie Met Ala Glu Asn Arg Lys Ser Ser 
50 55 60 

He He Arg Met Arg Asp Val Val Leu Phe Glu Lys Lys Val Tyr Leu 
€5 70 75 80 

Ser Glu Cys Lys Thr Gly Asn Gly Lys Asn Tyr Arg Gly Thr Met Ser 
85 90 95 

Lys Thr Lys Asn Gly He Thr Cys Gin Lys Trp Ser Ser Thr Ser Pro 
100 105 no 

His Arg Pro Arg Phe Ser Pro Ala Thr His Pro Ser Glu Gly Leu Glu 
115 120 125 

Glu Asn Tyr Cys Arg Asn Pro Asp Asn Asp Pro Gin Gly Pro Trp Cys 
130 135 140 

Tyr Thr Thr Asp Pro Glu Lys Arg Tyr Asp Tyr Cys Asp He Leu Glu 
!45 150 155 " 160 

Cys Glu Glu Glu Cys Met His Cys Ser Gly Glu Asn Tyr Asp Gly Lys 
165 170 175 

He Ser Lys Thr Met Ser Gly Leu Glu Cys Gin Ala Trp Asp Ser Gin 
180 185 190 
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Ser Pro His Ala His Gly Tyr lie Pro Ser Lys Phe Pro Asn Lys Asn 
195 200 205 

Leu Lys Lys Asn Tyr Cys Arg Asn Pro Asp Arg Glu Leu Arg Pro Trp 
210 215 220 

Cys Phe Thr Thr Asp Pro Asn Lys Arg Trp Glu Leu Cys Asp lie Pro 
225 230 235 240 

Arg Cys Thr Thr Pro Pro Pro Ser Ser Gly Pro Thr Tyr Gin Cys Leu 
245 250 255 

Lys Gly Thr Gly Glu Asn Tyr Arg Gly Asn Val Ala Val Thr Val Ser 
260 265 270 

Gly His Thr Cys Gin His Trp Ser Ala Gin Thr Pro His Thr His Asn 
275 280 285 

Arg Thr Pro Glu Asn Phe Pro Cys Lys Asn Leu Asp Glu Asn Tyr Cys 
290 295 300 

Arg Asn Pro Asp Gly Lys Arg Ala Pro Trp Cys His Thr Thr Asn Ser 
305 310 315 320 

Gin Val Arg Trp Glu Tyr Cys Lys lie Pro Ser Cys Asp Ser Ser Pro 



Val Ser Thr Glu Glu Leu Ala Pro Thr Ala Pro Pro Glu Leu Thr Pro 
340 345 350 

Val Val Gin Asp Cys Tyr His Gly Asp Gly Gin Ser Tyr Arg Gly Thr 
355 360 365 

Ser Ser Thr Thr Thr Thr Gly Lys Lys Cys Gin Ser Trp Ser Ser Met 
370 375 380 

Thr Pro His Arg His Gin Lys Thr Pro Glu Asn Tyr Pro Asn Ala Gly 
385 390 395 400 

Leu Thr Met Asn Tyr Cys Arg Asn Pro Asp Ala Asp Lys Gly Pro Trp 
405 410 415 

Cys Phe Thr Thr Asp Pro Ser Val Arg Trp Glu Tyr Cys Asn Leu Lys 
420 425 430 

Lys Cys Ser Gly Thr Glu Ala Ser Val Val Ala Pro Pro Pro Val Val 
435 440 445 

Leu Leu Pro Asn Val Glu Thr Pro Ser Glu Glu Asp Cys Met Phe Gly 
450 455 460 

Asn Gly Lys Gly Tyr Arg Gly Lys Arg Ala Thr Thr Val Thr Gly Thr 
465 470 475 480 

Pro Cys Gin Asp Trp Ala Ala Gin Glu Pro His Arg His Ser He Phe 



325 



330 



335 



485 



490 



495 
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Thr Pro Glu Thr Asn Pro Arg Ala Gly Leu Glu Lys Asn Tyr Cys Arg 
500 505 510 

Asn Pro Asp Gly Asp Val Gly Gly Pro Trp Cys Tyr Thr Thr Asn Pro 
515 520 525 

Arg Lys Leu Tyr Asp Tyr Cys Asp Val Pro Gin Cys Ala Ala Pro Ser 
530 535 540 

Phe Asp Cys Gly Lys Pro Gin Val Glu Pro Lys Lys Cys Pro Gly Arg 
545 550 555 560 

Val Val Gly Gly Cys Val Ala His Pro His Ser Trp Pro Trp Gin Val 
565 570 575 

Ser Leu Arg Thr Arg Phe Gly Met His Phe Cys Gly Gly Thr Leu lie 
580 585 590 

Ser Pro Glu Trp Val Leu Thr Ala Ala His Cys Leu Glu Lys Ser Pro 
595 600 605 

Arg Pro Ser Ser Tyr Lys Val lie Leu Gly Ala His Gin Glu Val Asn 
610 615 620 

Leu Glu Pro His Val Gin Glu lie Glu Val Ser Arg Leu Phe Leu Glu 
625 630 635 640 

Pro Thr Arg Lys Asp lie Ala Leu Leu Lys Leu Ser Ser Pro Ala Val 
645 650 655 

lie Thr Asp Lys Val lie Pro Ala Cys Leu Pro Ser Pro Asn Tyr Val 
660 665 670 

Val Ala Asp Arg Thr Glu Cys Phe lie Thr Gly Trp Gly Glu Thr Gin 
675 680 685 

Gly Thr Phe Gly Ala Gly Leu Leu Lys Glu Ala Gin Leu Pro Val lie 
690 695 700 

Glu Asn Lys Val Cys Asn Arg Tyr Glu Phe Leu Asn Gly Arg Val Gin 
705 710 715 720 

Ser Thr Glu Leu Cys Ala Gly His Leu Ala Gly Gly Thr Asp Ser Cys 
725 730 735 

Gin Gly Asp Ser Gly Gly Pro Leu Val Cys Phe Glu Lys Asp Lys Tyr 
740 745 750 

lie Leu Gin Gly Val Thr Ser Trp Gly Leu Gly Cys Ala Arg Pro Asn 
755 760 765 



Lys Pro Gly Val Tyr Val Arg Val Ser Arg Phe Val Thr Trp lie Glu 
770 775 780 



Gly Val Met Arg Asn Asn 
785 790 
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(2) INFORMATION FOR SEQ ID NO: 55: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 153 amino acids 
(B^ TYPE: amino acid 
(C) STRANDEDNESS : single 
(D; TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 



Val Tyr Leu Gin Thr Ser Leu Lys Tyr Asn lie Leu Pro Glu Lys Glu 
15 10 15 

Glu Phe Pro Phe Ala Leu Gly Val Gin Thr Leu Pro Gin Thr Cys Asp 
20 25 30 

Glu Pro Lys Ala His Thr Ser Phe Gin lie Ser Leu Ser Val Ser Tyr 
35 40 45 

Thr Gly Ser Arg Ser Ala Ser Asn Met Ala lie Val Asp Val Lys Met 
50 55 60 

Val Ser Gly Phe lie Pro Leu Lys Pro Thr Val Lys Met Leu Glu Arg 
65 70 75 80 

Ser Asn His Val Ser Arg Thr Glu Val Ser Ser Asn His Val Leu lie 
85 90 95 

Tyr Leu Asp Lys Val Ser Asn Gin Thr Leu Ser Leu Phe Phe Thr Val 
100 105 110 



Leu Gin Asp Val Pro Val Arg Asp Leu Lys Pro Ala lie Val Lys Val 
115 120 125 

Tyr Asp Tyr Tyr Glu Thr Asp Glu Phe Ala lie Ala Glu Tyr Asn Ala 
130 135 140 

Pro Cys Ser Lys Asp Leu Gly Asn Ala 
145 150 



(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 202 amino acids 

(B) TYPE : amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 



Met Glu Leu Trp Gly Ala Tyr Leu Leu Leu Cys Leu Phe Ser Leu Leu 
15 10 15 
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Thr Gin Val Thr Thr Glu Pro Pro Thr Gin Lys Pro Lys Lys He Val 
20 25 30 

Asn Ala Lys Lys Asp Val Val Asn Thr Lys Met Phe Glu Glu Leu Lys 
35 40 45 

Ser Arg Leu Asp Thr Leu Ala Gin Glu Val Ala Leu Leu Lys Giu Gin 
50 55 60 

Gin Ala Leu Gin Thr Val Cys Leu Lys Gly Thr Lys Val His Met Lys 
65 70 75 80 

Cys Phe Leu Ala Phe Thr Gin Thr Lys Thr Phe His Glu Ala Ser Glu 
85 90 95 

Asp Cys He Ser Arg Gly Gly Thr Leu Ser Thr Pro Gin Thr Gly Ser 
100 105 110 

Glu Asn Asp Ala Leu Tyr Glu Tyr Leu Arg Gin Ser Val Gly Asn Glu 
115 120 125 

Ala Glu He Trp Leu Gly Leu Asn Asp Met Ala Ala Glu Gly Thr Trp 
130 135 140 

Val Asp Met Thr Gly Ala Arg He Ala Tyr Lys Asn Trp Glu Thr Glu 
145 150 155 160 

He Thr Ala Gin Pro Asp Gly Gly Lys Thr Glu Asn Cys Ala Val Leu 
165 170 175 

Ser Gly Ala Ala Asn Gly Lys Trp Phe Asp Lys Arg Cys Arg Asp Gin 
180 185 190 



Leu Pro Tyr He Cys Gin Phe Gly He Val 
195 200 



(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 246 amino acids 

(B) TYPE: amino acid 

* (C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 

Gin Val Lys Leu Gin Gin Ser Gly Ala Glu Leu Val Lys Pro Gly Ala 
15 10 15 

Ser Val Lys Met Ser Cys Lys Ala Ser Gly Tyr Thr Phe Ala Ser Tyr 
20 25 30 

Trp He Asn Trp Val Lys Gin Arg Pro Gly Gin Gly Leu Glu Trp He 
35 40 45 
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Gly His lie Tyr Pro Val Arg Ser He Thr Lys Tyr Asn Glu Lys Phe 
50 55 60 

Lys Ser Lys Ala Thr Leu Thr Leu Asp Thr Ser Ser Ser Thr Ala Tyr 
65 70 75 80 

Met Gin Leu Ser Ser Leu Thr Ser Giu Asp Ser Ala Val Tyr Tyr Cys 
85 90 95 

Ser Arg Gly Asp Gly Ser Asp Tyr Tyr Ala Met Asp Tyr Trp Gly Gin 
100 105 110 

Gly Thr Thr Val Thr Val Ser Ser Gly Gly Gly Gly Ser Asp He Glu 
115 120 125 

Leu Thr Gin Ser Pro Ala He Leu Ser Ala Ser Pro Gly Gly Lys Val 
130 135 140 



Thr Met Thr Cys Arg Ala Ser Ser Ser Val Ser Tyr Met His Trp Tyr 
145 150 155 160 



Gin Gin Lys Pro Gly Ser Ser Pro Lys Pro Trp He Tyr Ala Thr Ser 
165 170 175 

Asn Leu Ala Ser Gly Val Pro Thr Arg Phe Ser Gly Thr Gly Ser Gly 
180 185 190 

Thr Ser Tyr Ser Leu Thr He Ser Arg Val Glu Ala Glu Asp Ala Ala 
195 200 205 

Thr Tyr Tyr Cys Gin Gin Trp Ser Arg Asn Pro Phe Thr Phe Gly Ser 
210 215 220 



Gly Thr Lys Leu Glu He Lys Arg Ala Ala Ala Glu Gin Lys Leu He 
225 230 235 240 

Ser Glu Glu Asp Leu Asn 
245 



(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 101 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 

Met Ser Asn Thr Gin Ala Glu Arg Ser He He Gly Met He Asp Met 
15 10 15 



Phe His Lys Tyr Thr Arg Arg Asp Asp Lys He Asp Lys Pro Ser Leu 
20 25 30 
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Leu Thr Met Met Lys Glu Asn Phe Pro Asn Phe Leu Ser Ala Cys Asp 
35 40 45 

Lys Lys Gly Thr Asn Tyr Leu Ala Asp Val Phe Glu Lys Lys Asp Lys 
50 55 60 

Asn Glu Asp Lys Lys lie Asp Phe Ser Glu Phe Leu Ser Leu Leu Gly 
65 70 75 80 

Asp lie Ala Thr Asp Tyr His Lys Gin Ser His Gly Ala Ala Pro Cys 
85 90 95 



Ser Gly Gly Ser Gin 
100 
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CLAIMS 

1. A method for generating a processed ensemble of polypep- 
tide molecules, in which processed ensemble the conforma- 
tional states represented contain a substantial fraction of 

5 polypeptide molecules in one particular uniform conformation, 
from an initial ensemble of polypeptide molecules which have 
the same amino acid sequence as the processed ensemble of 
polypeptide molecules, comprising subjecting the initial 
ensemble of polypeptide molecules to a series of at least two 
10 successive cycles each of which comprises a sequence of 

1) at least one denaturing step involving conditions 
exerting a denaturing influence on the polypeptide mole- 
cules of the ensemble followed by 

2) at least one renaturing step involving conditions 
15 having a renaturing influence on the polypeptide mole- 
cules having conformations resulting from the preceding 
step. 

2. A method according to claim 1, wherein the substantial 
fraction of polypeptide molecules in one conformational state 

20 in the processed ensemble constitutes at least 5% (w/w) of 
the initial ensemble of polypeptide molecules, 

3 . A method according to claim 1 or 2 , wherein the 
polypeptide molecules of the processed ensemble comprise 
cysteine -containing molecules, and the processed ensemble 
comprises a substantial fraction of polypeptide molecules in 
one particular uniform conformation which, in addition have 
substantially identical disulphide bridging topology. 

4. A method according to any of claims 1-3, wherein the 
polypeptide molecules are molecules which have an amino acid 
sequence identical to that of an authentic polypeptide, or 
are molecules which comprise an amino acid sequence corre- 



25 



30 
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spending to that of an authentic polypeptide joined to one or 
two additional polypeptide segments. 

5. A method according to claim 4, wherein the amino acid 
sequence corresponding to that of an authentic polypeptide is 

5 joined to the additional polypeptide segment or segments via 
a cleavable junction or similar or dissimilar cleavable 
junctions. 

6. A method according to any of claims 1-5, wherein the 
series comprises at least 3 cycles, such as at least 5, at 

10 least 8, at least 10 , and at least 25 cycles, and at most 
2000 cycles, such as at most 1000, at most 500, at most 200 
cycles, at most 100, and at most 50 cycles. 

7. A method according to any of the preceding claims, wherein 
the duration of each denaturing step is at least 1 millisec- 

15 ond and at most 1 hour, and the duration of each renaturing 
step is at least 1 second and at most 12 hours . 

8. A method according to claim 7, wherein the denaturing 
conditions of each individual denaturing step are kept sub- 
stantially constant for a period of time, and the renaturing 

20 conditions of each individual renaturing step are kept sub- 
stantially constant for a period of time, the periods of time 
during which conditions are kept substantially constant being 
separated by transition periods during which the conditions 
are changed. 

25 9. A method according to claim 8, in which the transition 
period between steps for which conditions are kept substan- 
tially constant has a duration between 0.1 second and 12 
hours . 



10. A method according to claim 9, wherein the period of time 
30 for which the denaturing conditions of the denaturing step 
are kept substantially constant has a duration of between 1 
and 10 minutes, and the period of time for which the renatu- 
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ring conditions of the renaturing step are kept substantially- 
constant has a duration of between 1 and 45 minutes. 

11. A method according to any of the preceding claims, where- 
in the polypeptide molecules are in contact with a liquid 

5 phase during the denaturing and renaturing steps, the liquid 
phase being an aqueous phase or an organic phase. 

12. A method according to claim 11, wherein the polypeptide 
molecules are substantially confined to an environment which 
allows changing or exchanging the liquid phase substantially 

10 without entraining the polypeptide molecules. 

13. A method according to claim 12, wherein the polypeptides 
are confined to a dialysis device or a liquid two-phase 
system. 

14. A method according to claim 12, wherein the polypeptide 
15 molecules are bound to a solid or semisolid carrier, such as 

a filter surface, a hollow fibre or a beaded chromatographic 
medium, e.g. an agarose or polyacrylamide gel, a fibrous 
cellulose matrix, an HPLC or FPLC matrix, a substance having 
molecules of such a size that the molecules with the polypep- 

20 tide molecules bound thereto, when dissolved or dispersed in 
a liquid phase, can be retained by means of a filter, a 
substance capable of forming micelles or participating in the 
formation of micelles allowing the liquid phase to be changed 
or exchanged substantially without entraining the micelles, 

25 or a water-soluble polymer. 

15. A method according to claim 14, wherein the polypeptide 
molecules are non-covalently adsorbed to the carrier through 
a moiety having affinity to a component of the carrier, such - 
as a biotin group or an analogue thereof bound to an amino 

30 acid moiety of the polypeptide, the carrier having avidin, 
streptavidin or analogues thereof attached thereto. 
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16. A method according to claim 15 , wherein the moiety has an 
amino acid sequence identical to SEQ ID NO: 47, the carrier 
comprising a Nitrilotriacetic Acid derivative (NTA) charged 
with Ni ++ ~ions . 

5 17. A method according to any of the preceding claims, where- 
in the polypeptide molecules comprise a polypeptide segment 
which is capable of directing preferential cleavage by a 
cleaving agent at a specific peptide bond. 

18. A method according to claim 17, wherein the cleavage- 
10 directing polypeptide segment is one which is capable of 

directing preferential cleavage at a specific peptide bond by 
a cleaving agent selected from the group consisting of 
cyanogen bromide, hydroxylamine , iodosobenzoic acid, N-bromo- 
succinimide, and enzymes such as bovine coagulation factor X a 
15 or an analogue and/or homologue thereof and bovine 

enterokinase or an analogue and/or homologue thereof. 

19. A method according to claim 17 or 18, wherein the 
polypeptide segment which directs preferential cleavage is a 
sequence which is substantially selectively recognized by the 

20 bovine coagulation factor X a or an analogue and/or homologue 
thereof, such as a polypeptide segment which has an amino 
acid sequence selected from the group consisting of SEQ ID 
NO: 38, SEQ ID NO: 40, SEQ ID NO: 41 and SEQ ID NO: 42. 

20. A method according to any of the preceding claims, where - 
25 in the polypeptide molecules comprise a polypeptide segment 

which is in vitro- convertible into a derivatized polypeptide 
segment capable of directing preferential cleavage by a 
cleaving agent at a specific peptide bond. 

21. A method according to claim 20, wherein the in vitro- 

30 convertible polypeptide segment is convertible into a deriva- 
tized polypeptide segment which is substantially selectively 
recognized by the bovine coagulation factor X a or an analogue 
and/or homologue thereof. 
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22. A method according to claim 21, wherein the in vitro- 
convertible polypeptide segment has an amino acid sequence 
selected from the group consisting of SEQ ID NO: 43, SEQ ID 
NO: 44, SEQ ID NO: 45 and SEQ ID NO: 46, 

5 23. A method according to claim 22 wherein the polypeptide 
molecules comprise a polypeptide segment with either 

the amino acid sequence SEQ ID NO: 43 or SEQ ID NO: 44, 
which is converted into a derivatized polypeptide, which 
is substantially selectively recognized by bovine coagu- 
10 lation factor X a or an analogue and/or homologue thereof, 

by reacting the cysteine residue with N- (2-mercaptoe- 
thyl)morpholyl-2- thiopyridyl disulphide or mercaptothio- 
acetate-2- thiopyridyl disulphide, or 

with the amino acid sequence SEQ ID NO: 45 or SEQ ID NO: 
15 46, which is converted into a derivatized polypeptide, 

which is substantially selectively recognized by bovine 
coagulation factor X a , by oxidation of the thioether 
moiety in the methionine side group to a sulphoxide or 
sulphone derivative. 

20 24. A method according to any of claims 19, 22 or 23, wherein 
the polypeptide segment selected from the group consisting of 
SEQ ID NO: 38, SEQ ID NO: 40, SEQ ID NO: 41 and SEQ ID NO: 42 
or selected from the group consisting of SEQ ID NO: 43, SEQ 
ID NO: 44, SEQ ID NO: 45 and SEQ ID NO: 46 is linked N-ter- 

25 minally to the authentic polypeptide. 

25. A method according to any of claims 8-24, wherein the 
change of conditions during the transition period is accom- 
plished by changing the chemical composition of the liquid 
phase with which the polypeptide molecules are in contact. 



30 26. A method according to claim 25, wherein denaturing of the 
polypeptide molecules is accomplished by contacting the 
polypeptide molecules with a liquid phase in which at least 
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one denaturing compound is dissolved, and wherein renaturing 
of the polypeptide molecules is accomplished by contacting 
the polypeptide molecules with a liquid phase which either 
contains at least one dissolved denaturing compound in such a 
5 concentration that the contact with the liquid phase will 
tend to renature rather than denature the ensemble of 
polypeptide molecules in their respective conformation states 
resulting from the preceding step, or contains no denaturing 
compound . 

10 27. A method according to claim 26 , wherein the denaturing of 
the polypeptide molecules is achieved or enhanced by decrea- 
sing or increasing pH of the liquid phase. 

28. A method according to claim 26 or 27, wherein the dena- -. 
turing compound is selected from urea, guanidine-HCl, and di- 

15 C-L.galkylf ormamide such as dimethyl formamide and di-C^.g- 
alkylsulphone . 

29. A method- according to any of claims 11-28, wherein the 
liquid phase used in at least one of the denaturing steps 
and/or in at least one of the renaturing steps contains at 

20 least one disulphide-reshuf fling system, X. 

30. A method according to claim 23, wherein the at least one 
disulphide-reshuf fling system X is one which is capable of 
reducing and/or reshuffling incorrectly formed disulphide. 
bridges under conditions with respect to concentration of the 

25 denaturing agent at which unfolded and/or misf olded proteins 
are denatured and at which there is substantially no reduc- 
tion and/or reshuffling of correctly formed disulphide 
bridges . 

31- A method according to claim 30, wherein the presence of 
30 the disulphide reshuffling system X in at least one step 
results in a ratio between the relative amount of 
reduced/reshuffled initially incorrectly formed disulphide 
bridges and the relative amount of reduced/reshuffled 
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initially correctly formed disulphide bridges of at least 
1.05. 

32. A method according to any of claims 29-31 wherein the 
disulphide- reshuffling system contains glutathione, 2-mercap- 

5 toethanol or thiocholine, each of which in admixture with its 
corresponding symmetrical disulphide. 

33. A method according to any of claims 11-32, wherein all 
cysteine residues in the polypeptide molecules have been 
converted to mixed disulphide products of either glutathione, 

10 thiocholine, mercaptoethanol or mercaptoacetic acid, during 
at least one of the denaturing/renaturing cycles. 

34. A method according to claim 33, wherein the conversion of*, 
the cysteine residues to mixed disulphide products is accom- 
plished by reacting the fully denatured and fully reduced 

15 ensemble of polypeptide molecules with an access of a reagent 
which is a high- energy mixed disulphide compound. 

35. A method according to claim 34, wherein the mixed high 
energy disulphide compounds are aliphatic-aromatic. 



36. A method according to claim 34 or 35, wherein the mixed 
20 high energy disulphide compounds has the general formula: 



*2 



R 1 -S-S-C-R 3 



25 R 4 



wherein R x is 2-pyridyl, R 2 , R 3 and R 4 are hydrogen or an 
optionally substituted lower aromatic or aliphatic hydro- 
carbon group. 



30 



37. A method according to any of claims 34-36, wherein the 
high -energy mixed disulphide compounds are selected from the 
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group consisting of glutathionyl-2- thiopyridyl disulphide, 2- 
thiocholyl - 2 - thiopyridyl disulphide , 2 -mercaptoethanol - 2 - 
thiopyridyl disulphide and mercaptoacetate- 2 -thiopyridyl 
disulphide. 

5 38. A method according to any of claims 11-37, wherein the 
polarity of the liquid phase used in the renaturing of the 
polypeptide molecules has been modified by the addition of a 
salt, a polymer and/or a hydrofluoro compound, such as tri- 
f luoroethanol. 

10 39. A method according to any of claims 1-24 or 29-38, where- 
in the denaturing and renaturing of the polypeptide molecules 
is accomplished by direct changes in physical parameters to 
which the polypeptide molecules are exposed, such as tempera . 
ture or pressure. 

15 40. A method according to claim 25, wherein the chemical 
changes in the liquid phase are accomplished by changing 
between a denaturing solution B and a renaturing solution A. 

41. A method according to claim 40, wherein the concentration 
of one or more denaturing compounds in B is adjusted after 

20 each cycle. 

42. A method according to claim 41, wherein the concentration 
of one or more denaturing compounds in B is decremented after 
each cycle. 

43. A method according to claim 40, wherein the concentration 
25 of one or more denaturing compounds in medium B is kept 

constant in each cycle. 

44. A method according to any of the preceding claims in 
which the polypeptide molecules of the ensemble have a length 
of at least 25 amino acid residues and at most 5000 amino 

30 acid residues. 
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45. A method according to any of the preceding claims, where- 
in the polypeptides of the initial ensemble are artificial 
polypeptides produced in prokiaryotic cells by means of recom- 
binant DNA- techniques. 

5 46. A method according to claim 45, wherein the initial 
sample of polypeptide molecules are unfolded or misfolded 
diabody molecules (artificial bispecific and bivalent anti- 
body fragments) or monomer fragments of diabody molecules. 

47. A method for producing correctly folded diabody mol- 

10 ecules, wherein an initial ensemble of polypeptide molecules 
comprising unfolded and/or misfolded polypeptides having 
amino acid sequences identical to monomer fragments of diabo- 
dy molecules is subjected to a series of at least two suc- 
cessive cycles each of which comprises a sequence of 

15 1) at least one denaturing step involving conditions 

exerting a denaturing influence on the polypeptide mole- 
cules of the ensemble followed by 

2) at least one renaturing step involving conditions 
having a renaturing influence on the polypeptide mole- 
20 cules having conformations resulting from the preceding 

step, 

the series of cycles being so adapted that a substantial 
fraction of the initial ensemble of polypeptide molecules is 
converted to a fraction of correctly folded diabody mo- 
25 lecules. 

48. A method according to claim 47, wherein the polypeptide 
molecules are in contact with a liquid phase containing at 
least one disulphide reshuffling system in at least one 
denaturing/renaturing cycle. 



30 49. A polypeptide which is a proenzyme of a serine protease, 
which proenzyme has an amino acid sequence different from 
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that of bovine coagulation factor X (Protein Identification 
Ressource (PIR) , National Biomedical Research Foundation, 
Georgetown University, Medical Center, U.S.A., entry: 
P1;EXB0) and which can be proteolytically activated to gene- 
5 rate the active serine protease by incubation of a solution 
of the polypeptide in a non- denaturing buffer with a sub- 
stance that cleaves the polypeptide to liberate a new N- 
terminal residue, 

the substrate specificity of the serine protease being 
10 identical to or better than that of bovine blood coagula- 

tion factor X a , as assessed by each of the ratios 
(k(I)/k(V) and k(III)/k(V) between cleavage rate, k, 
against each of the substrates I and III: 

I: Benzoyl -Val-Gly-Arg-paranitroanilide, 

15 III : Tosyl -Gly- Pro- Arg-paranitroanilide , 

versus that against the substrate 

V : Benz oy 1 -lie- Glu - Gly - Arg - parani t roanil ide 

at 20°C, pH=8 in a buffer consisting of 50 mM Tris, 100 
iriM NaCl, 1 mM CaCl 2 , being identical to or lower than the 
20 corresponding ratio determined for bovine coagulation 

factor X a which is substantially free from contaminating 
proteases . 

50. A polypeptide according to claim 49, wherein (k(I)/k(V) 
is at most 0.04 and k(III)/k(V) is at most 0.15. 

25 51. A polypeptide according to claim 49, the substrate speci- 
ficity of which is identical to or better than that of bovine- 
blood coagulation factor X a , as assessed by each of the 
ratios (k(I)/k(V), k(II)/k(V), k(III)/k(V) and k (IV) /k (V) ) 
between cleavage rate, k, against each of the substrates I- 

30 IV: 
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I: Benzoyl -Val-Gly-Arg-paranitroanilide, 

II : Tosy 1 - Gly - Pro - Lys - parani t roanil ide , 

III : Tosy 1 - Gly - Pro - Arg - parani t roanil ide , 
IV: (d, 1) Val- Leu -Arg- parani troanilide 

5 versus that against the substrate 

V : Benzoyl - lie - Glu - Gly- Arg - parani t roanil ide 

at 20°C, pH=8 in a buffer consisting of 50 mM Tris, 100 mM 
NaCl, 1 mM CaCl 2 , being identical to or lower than the corre- 
sponding ratio determined for bovine coagulation factor X a 
10 which is substantially free from contaminating proteases. 

52. A polypeptide according to claim 51, 'wherein (k(I)/k(V) 
is at most 0.04, k(II)/k(V) is at most 0.015, k(III)/k(V) is 
at most 0.15, and k(IV)/k(V)) is at most 0.005. 

53. A polypeptide according to any of claims 49-52, which 

15 polypeptide has a molecular weight, of at most 70,000 and 
of at least 15,000. 

54. A polypeptide according to any of claims 49-53, which has 
an amino acid sequence which is a subsequence of SEQ ID NO: 2 
or an analogue of such a subsequence. 

20 55. A polypeptide according to claim 54 which has a sequence 
homology at the polypeptide level of at least 60% identity 
compared to a segment of SEQ ID NO: 2, allowing for deletions 
and/or insertions of at most 50 amino acid residues. 

56. A polypeptide according to claim 54 which has an amino 

25 acid sequence consisting of residues 82-484 or residues 166- - 
484 of SEQ ID NO: 2. 

57. A nucleic acid fragment which is capable of encoding a 
polypeptide according to any of claims 54-56, such as a DNA 
fragment . 
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58. A nucleic acid fragment according to claim 57, in which 
at least 60% of the coding triplets encode the same amino 
acids as a nucleic acid fragment of the nucleic acid which 
encodes bovine coagulation factor X, allowing for insertions 

5 and/or deletions of at most 150 nucleotides. 

59 . A nucleic acid fragment according to claim 57 which has a 
nucleotide sequence selected from the group consisting of, 
nucleotides 76-1527, nucleotides 319-1527, or nucleotides 
571-1527 of SEQ ID NO: 1, or an analogue thereof. 

10 60. An expression system comprising a nucleic acid fragment 
according to any of claims 57-59 encoding a polypeptide 
according to any of claims 54-56, the system comprising a 5'- 
flanking sequence capable of mediating expression of said 
nucleotide sequence. 

15 61. A replicable expression vector carrying a nucleic acid 
fragment according to any of claims 57-59, which vector is 
capable of replicating in a host organism or a cell line, the 
vector being such as a plasmid, phage, cosmid, mini -chromo- 
some or virus. 

20 62 . A vector according to claim 61 which, when introduced in 
a host cell, is integrated in the host cell genome. 

63. An organism which carries and is capable of replicating 
the nucleic acid fragment according to any of claims 57-59. 

64. An organism according to claim 63, which is a microor- 
25 ganism such as a bacterium, a yeast, a protozoan, or a cell 

derived from a multicellular organism such as a fungus, an 
insect cell, a plant cell, a mammalian cell or a cell line. 

65. A method of producing a polypeptide as defined in any of 
claims 54-56, comprising the following steps of: 
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a. inserting a nucleic acid fragment as defined in any of 
claims 57-59 in an expression vector, 

b. transforming a host organism according to claim 63 or 
64 with the vector produced in step a, 

5 c. culturing the host organism produced in step B. to 

express the polypeptide, 

d. harvesting the polypeptide, 

e. optionally subjecting the polypeptide to post- trans - 
lational modification, 

10 f . subjecting the polypeptide to a method according to 

any of claims 1-48, and 

g. optionally subjecting the polypeptide to further 
modification. 

66. The use of a polypeptide according to any of claims 54-56 
15 for cleaving polypeptides at the cleavage site for bovine 

coagulation factor X a , the cleavage site having the amino 
acid sequence selected from the group consisting of SEQ ID 
NO: 38, SEQ ID NO: 40, SEQ ID NO: 41 and SEQ ID NO: 42. 

67. The use of a polypeptide according to any of claims 54-56 
20 for cleaving polypeptides at the cleavage site for bovine 

coagulation factor X a , the cleavage site having a modified 
version of the amino acid sequence selected from the group of 
SEQ ID NO: 43, SEQ ID NO: 44, SEQ ID NO: 45 and SEQ ID NO: 
46, which has been converted to a cleavable form according to 
25 the method in claim 23. 

68. The use of a polypeptide according to any of claims 54-56 
in a method according to claim 18, 19 or 24 for cleaving 
polypeptides at the specific FX a recognition site, the clea- 
ving site having the amino acid sequence SEQ ID NO: 38. 
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a 2 MR: 

GSIEGRAI C R * 

#1 GATCCATCGAGGGTAGGGCTATC TGCCGATA 



GSIEGRAI K A * 

GATCCATCGAGGGTAGGGCTATC AAGGCCTA 



GSIEGRAI K K * 

GATCCATCGAGGGTAGGGCTATC AAGAAGTA 



21 G S HHHHHH 
CATATGGGATCGCATCACCATCACCATCACG AGCTTGAATTC 

*-«^ Bam HI Hind 111 



Phage T7 __J — 1 ^ 

Promoter^ — " 

P T 7 H 6 

\\ Amp r 

Fig. 6 
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a 2 MR: 

GSIEGRGT L D * 

# 4 GATCCATCGAGGGTAGGGGCACC CTGGACTA 



GSIEGRVP DQ* 
#5 GATCCATCGAGGGTAGGGTGCCT GACCAGTA 



GSIEGRGGQC F K * 

GATCAATCGAGGGTAGGGGTGGTCAGTGC TTTAAGTA 




Fig. 7 
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' a 2 MR: 

# GSIEGRGT F K * 

ff 7 GATCCATCGAGGGTAGGGGCACC TTTAAGTA 



* GSIEGRAV HI* 
ff 8 GATCCATCGAGGGTAGGGCGGTG CACATCTA 



GSIEGPVS SI* 
GATCCATCGAGGGTAGGGTGTCC AGCATCTA 




Fig. 8 
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<X2-Macroglobulin Receptor. 

1 MLTPPLLLLLPLLSALVAAAIDAPKTCSPKQFACRDQITCISKGWRCDGERDCPDGSDEA 

109 

61 PEICPQSKAQRCQPNEHNCLGTELCVPMSRLCNGVQDCMDGSDEGPHCJRELQGNCSRLGC 
121 QHHCVPTLDGPTCYCNSSFQLQADGKTCKDFDECSVYGTCSQLCTNTDGSFICGCVEGYL 

iafl 

181 LQPDNRSCKAKNEPVDRPP VLLIANSQNILATYLSGAQVSTITPTSTRQTTAMDFS YANE 
241 TVCWVHVGDSAAQTQI^CAI^GLKGFVDEHTINISLSLHHVEQMAIDWLTGNFYFVDDI 
301 DDRIFVCNRNGDTCVTLLDLELYNPKGIALDP AMGKVFFTDYGQIPKVERCDMDGQNRTK 
361 LVDSKIVFPHGITLDLVSRLVYWADAYLDYIEWDYEGKGRQTIIQGILIEHLYGLTVFE 
421 N YLYATNSDNANAQQKTSVIRVNRFNSTE YQWTRVDKGGALHI YHQRRQPRVRSHACEN 

521 

481 DQYGKPGGCSDICLLANSHKARTCRCRSGFSLGSDGKSCKKPEHELFLVYGKGRPGIIRG 
541 MDMGAKVPDEHMIPIENLMNPRALDFHAETGFIYFADTTSYLIGRQKIDGTERETILKDG 
601 IHNVEGVAVDWMGDNLYWTDDGPKKTISVARLEKAAQTRKTLIEGKMTHPRAIWDPLNG 
661 WMYWTD WEEDP KD SRRGRLERAWMD GSHRD I F VTSKTVLWPNGLS LD I P AGRL YWVD AF Y 
721 DRIETILLNGTDRKI VYEGPELNHAFGLCHHGNYLFWTEYRSGSVYRLERGVGGAPPTVT 

803 

781 LLRSERPP IFEIRMYDAQQQQVOTNKCRVNNGGCSSLCLATPGSRQCACAEDQVLDADGV 
841 TCLANPSYVPPPQCQPGEFACANSRCIQERWKCDGDNDCLDNSDEAPALCHQHTCPSDRF 
901 KCENNRC IPNRWLCDGDNDCGNSEDESNATCSARTCPPNQFSCASGRCIP ISWTCDLDDD 
961 CGDRSDESASCAYPTCFPLTQFTCNNGRCININWRCDNDNDCGDNSDEAGCSHSCSSTQF 
1021 KCNSGRCIPEHWTCDGDNDCGDYSDETHANCTNQATRPPGGCHTDEFQCRLDGLCIPLRW 
1081 RCDGDTDCMDSSDEKSCEGVTHVCDPSVKFGCKDSARCISKAWVCDGDNDCEDNSDEENC - 

1184 

1141 ESLACRPPSHPCANNTSVCLPPDKLCDGNDDCGDGSDEGELCDQCSLNNGGCSHNCSVAP 
1201 GEGIVCSCPLGMELGPDNHTCQIQSYCAKHLKCSQKCDQNKFSVKCSCYEGWVLEPDGES 
1265 

1261 CRSLOTFKPFIIFSNRHEIRRIDLHKGDYSVLVPGLRNTIALDFHLSQSALYWTDWEDK 
1321 I YRGKLLDNGALTSFEWIQYGLATPEGLAVDWIAGNIYWVESNLDQIEVAKLDGTLRTT 
1381 LLAGDIEHPRAIALDPRDGILFWTDWDASLPRIEAASMSGAGRRTVHRETGSGGWPNGLT 
14 41 VDYLEKRILWIDARSDAI YSARYDGSGHMEVLRGHEFLSHPFAVTLYGGEVYWTDWRTNT 
1501 LAKANKWTGHNVTWQRTNTQPFDLQVYHPSRQPMAPNPCEANGGQGPCSHLCLINYNRT 

1582 

1561 VSCACPHLMKLHKDNTTCYEFJQCFLLYARQMEIRGVDLDAPYYNYIISFTVPDIDNVTVL 
1621 D YD AREQRVYWSD VRTQAIKRAF INGTGVETVVSADLPN AHGLAVDWVSRNLFWTS YDTN 
1681 KKQINVARLDGSFKNAWQGIJIQPHGLVVHPLRGKLYWTDGDNISMANMDGSNRTLLFSG 
1741 QKGPVGLAIDFPESKLYWISSGNHTINRCNLDGSGLEVIDAMRSQLGKATALAIMGDKLW 
1801 WADQVSEKMGTCSKADGSGSVVLRNSTTLVMHMKVYDESIQLDHKGTNPCSVNNGDCSQL 
1861 CLPTSETTRSCMCTAGYSLRSGQQACEGVGSFLLYSVHEGIRGIPLDPNDKSDALVPVSG 
1921 TSLAVGIDFHAENDTIYWVDMGLSTISRAKRDQTWREDWTNGIGRVEGIAVDWIAGNIY 
1981 WTDQGFDVIEVARLNGSFRYWISQGLDKPRAITVHPEKGYLFWTEWGQYPRIERSRLDG 
2041 TERWLVNVSISWPNGISVDYQDGKLYWCDARTDKIERIDLETGENREWLSSNNMDMFS 
2101 VSVFEDFIYWSDRTHANGSIKRGSKDNATDSVPLRTGIGVQLKDIKVFNRDRQKGTNVCA 
2161 VANGGCQQLCLYRGRGQRACACAHGMLAEDGASCREYAGYLLYSERTILKSIHLSDERNL 
2221 NAPVQPFEDPEHMKNVIALAFDYRAGTSPGTPNRIFFSDIHFGNIQQINDDGSRRITIVE 
2281 NVGSVEGLAYHRGWDTLYWTSYTTSTITRHTVDQTRPGAFERETVITMSGDDHPRAFVLD 
2341 ECQNLMFWTNWNEQHPSIMRAALSGANVLTLIEKDIRTPNGLAIDHRAEKLYFSDATLDK 

2401 I ERCE YD G SHRYV I LKSEP VHPFGL AVYGEH I FWTDWVRRAVQRANKHVG SNMKLLRVD I 

?5?Q 

24 61 PQQPMGIIAVANDTNSCELSPCRINNGGCQDLCLLTHQGHVNCSCRGGRILQDDLTCRAV 
2521 NSSCRAQDEFECANGECINFSLTCDGVPHCKDKSDEKPSYCNSRRCKKTFRQCSNGRCVS 
2581 NMLWCNGADDCGDGSDEIPCNKTACGVGEFRCRDGTCIGNSSRCNQFVDCEDASDEMNCS 
2641 ATDCSSYFRLGVKGVLFQPCERTSLCYAPSWVCDGANDCGDYSDERDCPGVKRPRCPLNY 
2701 FACPSGRCIPMSWTCDKEDDCEHGEDETHCNKFCSEAQFECQNHRCISKQWLCDGSDDCG 



Fig. 9a 
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Fig. 15 
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Fig. 16 
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